Add crossmark update policy page deposit functionality#1380
Add crossmark update policy page deposit functionality#1380
Conversation
Co-authored-by: robertatakenaka <505143+robertatakenaka@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR adds CrossRef CrossMark update policy page deposit functionality to the doi_manager app. It introduces a new crossmark.py module with two public functions — one to build the CrossRef XML payload and one to POST it to the CrossRef deposit API — along with a comprehensive test suite.
Changes:
- Adds
doi_manager/crossmark.pywithbuild_crossmark_policy_page_xml()anddeposit_crossmark_policy_page()functions for registering a CrossMark update policy page DOI with CrossRef. - Adds
doi_manager/tests.pywith unit tests covering XML structure and the deposit HTTP call behavior.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
doi_manager/crossmark.py |
New module with XML builder and HTTP deposit functions for CrossRef CrossMark policy page registration |
doi_manager/tests.py |
New test file covering XML content assertions and HTTP posting behavior |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| xml_tree = ET.ElementTree(doi_batch) | ||
| return ET.tostring( | ||
| xml_tree, | ||
| pretty_print=True, | ||
| xml_declaration=True, | ||
| encoding="utf-8", | ||
| ).decode("utf-8") |
There was a problem hiding this comment.
The ET.tostring() call at line 146 is passed xml_tree, which is an ET.ElementTree instance, not an ET.Element. While lxml's tostring() does accept ElementTree objects, creating the ElementTree wrapper at line 145 is entirely unnecessary here — ET.tostring() can operate directly on the root Element (doi_batch). The intermediate variable xml_tree should be removed and doi_batch should be passed directly to ET.tostring(). This avoids confusion and eliminates an unnecessary allocation.
| class DepositCrossmarkPolicyPageTest(TestCase): | ||
| def _call(self, **kwargs): | ||
| defaults = dict( | ||
| doi="10.1234/update-policy", | ||
| url="https://example.org/update-policy", | ||
| login_id="mylogin", | ||
| login_passwd="mypassword", | ||
| depositor_name="Test Depositor", | ||
| depositor_email_address="depositor@example.org", | ||
| registrant="Test Registrant", | ||
| ) | ||
| defaults.update(kwargs) | ||
| return deposit_crossmark_policy_page(**defaults) |
There was a problem hiding this comment.
The documented fallback behaviour — where depositor_name falls back to registrant when it is None or empty — is not covered by any test in DepositCrossmarkPolicyPageTest. There should be a test that calls deposit_crossmark_policy_page with depositor_name=None (and depositor_name="") and asserts that the resulting XML's <depositor_name> element contains the value of registrant instead.
| nsmap = { | ||
| "xsi": "http://www.w3.org/2001/XMLSchema-instance", | ||
| } | ||
|
|
||
| doi_batch = ET.Element("doi_batch", nsmap=nsmap) | ||
| doi_batch.set("version", "4.4.0") | ||
| doi_batch.set("xmlns", "http://www.crossref.org/schema/4.4.0") | ||
| doi_batch.set( | ||
| "{http://www.w3.org/2001/XMLSchema-instance}schemaLocation", | ||
| "http://www.crossref.org/schema/4.4.0 " | ||
| "http://www.crossref.org/schemas/crossref4.4.0.xsd", | ||
| ) |
There was a problem hiding this comment.
The CrossRef default namespace (http://www.crossref.org/schema/4.4.0) is set via doi_batch.set("xmlns", "http://www.crossref.org/schema/4.4.0") (a plain attribute assignment) rather than through lxml's nsmap mechanism. In lxml, a default namespace must be declared by passing None as the key in the nsmap dict when creating the element (e.g., {None: "http://www.crossref.org/schema/4.4.0", "xsi": ...}), and all child elements must also include the full namespace URI in their tag (e.g., ET.SubElement(head, "{http://www.crossref.org/schema/4.4.0}doi_batch_id")).
With the current code, the child elements (head, doi_batch_id, timestamp, body, database, etc.) are created without any namespace, meaning they belong to no namespace. This causes two problems:
- The serialized XML will have
xmlnsas a literal plain attribute rather than a proper namespace declaration; CrossRef's deposit endpoint will likely reject the submission as non-conformant. - The tests that search for elements using the
{http://www.crossref.org/schema/4.4.0}namespace prefix (e.g.,root.find(f".//{{{ns}}}doi_batch_id")) will returnNonefor all lookups, causing everyassertIsNotNoneinBuildCrossmarkPolicyPageXMLTestto fail at runtime.
The fix is to include None: "http://www.crossref.org/schema/4.4.0" in nsmap and prefix every ET.SubElement tag with {http://www.crossref.org/schema/4.4.0} (or build all tag strings with the namespace URI), and remove the doi_batch.set("xmlns", ...) call.
Implements the function to build and deposit a CrossRef crossmark update policy page — a publisher's web page (with a registered DOI) that describes how corrections, retractions, and withdrawals are handled.
O que esse PR faz?
doi_manager/crossmark.pywith two public functions:build_crossmark_policy_page_xml(doi, url, depositor_name, depositor_email_address, registrant)— builds a CrossRef schema 4.4.0 XML batch payload using the<database>/<dataset>structure to register the policy page URL under the given DOI.deposit_crossmark_policy_page(doi, url, login_id, login_passwd, depositor_name, depositor_email_address, registrant, is_test=False)— validates inputs, builds the XML, and POSTs it to the CrossRef deposit API as a multipart upload. Supportsis_testto target the test server.Onde a revisão poderia começar?
doi_manager/crossmark.py— start withbuild_crossmark_policy_page_xml, thendeposit_crossmark_policy_page.Como este poderia ser testado manualmente?
deposit_crossmark_policy_page(...)withis_test=Trueand valid CrossRef test credentials, a DOI prefix you own, and the URL of the policy page.Algum cenário de contexto que queira dar?
CrossMark requires publishers to register their update policy page URL as a DOI with CrossRef before they can embed the CrossMark widget in articles. This deposit is separate from article DOI deposits — it uses the
<database>/<dataset>XML schema rather than<journal>/<journal_article>.The
depositor_nameparameter falls back toregistrantwhen empty/None.Screenshots
N/A — backend only.
Quais são tickets relevantes?
Referências
Original prompt
🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.