Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
73 commits
Select commit Hold shift + click to select a range
c327cf1
Initial backend endpoint for returning crossref cited by information …
tombch Apr 27, 2026
5e669f6
Display citations in frontend
tombch Apr 27, 2026
2f82180
Updated to use seqSetId and version to locate DOI first
tombch Apr 27, 2026
9497e96
Formatting
tombch Apr 27, 2026
8c831f0
Switched SeqSet details page modals to dialogs in line with other Loc…
tombch Apr 28, 2026
5730e79
Formatting
tombch Apr 28, 2026
ab09d96
Updated frontend to use get-seqset-citations for total citations and …
tombch Apr 29, 2026
e23931d
Same syntax for empty titles
tombch Apr 29, 2026
05bcef9
Add caching for crossref API call
tombch Apr 29, 2026
27470df
Formatting
tombch Apr 29, 2026
2c67ebc
Fixed tests by changing crossref properties from final to private, an…
tombch Apr 29, 2026
b891a49
Merge branch 'main' into seqset-citations
tombch Apr 29, 2026
e38caa8
Merge branch 'main' into seqset-citations
tombch Apr 30, 2026
d3b5500
Merge branch 'main' into seqset-citations
tombch May 4, 2026
9199679
Fixing lockfile
tombch May 4, 2026
f0a2c64
Merge branch 'main' into seqset-citations
tombch May 4, 2026
9e865b0
parseCrossRefCitedByXML tests
tombch May 4, 2026
b8c8d18
Added types + comments
tombch May 6, 2026
5c20008
Added tests for getSeqSetCitations, removed relaxed from crossref moc…
tombch May 6, 2026
a98c60d
Remove caching dependency
tombch May 7, 2026
0890599
Merge branch 'main' into seqset-citations
tombch May 7, 2026
c9ab8a3
Added seqset citations table, enforced unique on non-null seqset DOIs…
tombch May 7, 2026
c31180a
Merge branch 'main' into seqset-citations
tombch May 7, 2026
2e0a647
Update schema documentation based on migration changes
actions-user May 7, 2026
5d20ffa
Added skipping of seqset dois if not in the db
tombch May 7, 2026
c136152
Task depends on seqsets being enabled
tombch May 8, 2026
f35d284
Updated backend to have a normalised structure for citations - seqset…
tombch May 12, 2026
cc86479
Merge branch 'main' into seqset-citations
tombch May 12, 2026
cb53341
Update schema documentation based on migration changes
actions-user May 12, 2026
588f7bb
Formatting
tombch May 12, 2026
040a7a9
Update frontend to use citing source structure
tombch May 12, 2026
f1c5c6a
Merge branch 'seqset-citations' of github.com:loculus-project/loculus…
tombch May 12, 2026
ddfd5c1
Merge branch 'main' into seqset-citations
tombch May 13, 2026
55c8a4d
Merge branch 'main' into seqset-citations
tombch May 18, 2026
1562e46
Update schema documentation based on migration changes
actions-user May 18, 2026
f5c39ba
Update citing source to use doi as primary key and remove source type
tombch May 18, 2026
766a462
Merge branch 'seqset-citations' of github.com:loculus-project/loculus…
tombch May 18, 2026
814564e
Update schema documentation based on migration changes
actions-user May 18, 2026
69b1e16
No longer returns seqset dois in DTO, throws exceptions for unparseab…
tombch May 18, 2026
162d834
Increased task fixed delay to six hours, removed where clause on pres…
tombch May 18, 2026
bacb448
Remove requirement for seqset DOI
tombch May 19, 2026
f1eca3c
Merge branch 'main' into seqset-citations
tombch May 20, 2026
5ac697f
Wait/read timeouts for crossref citedby
tombch May 20, 2026
d728ae5
Refactor parseCrossRefCitedByXML and mergeCitingSources to have list …
tombch May 20, 2026
90e27c8
Merge branch 'seqset-citations' of github.com:loculus-project/loculus…
tombch May 20, 2026
e3ae3f5
Removed unique constraint on doi for now - address in follow-up
tombch May 20, 2026
978e375
Claude nitpicks
tombch May 20, 2026
ca32d4d
Removed new get-seqset-citations and refactored it into existing get-…
tombch May 20, 2026
f04e67e
Update schema documentation based on migration changes
actions-user May 20, 2026
d16b13c
Restored log comment
tombch May 20, 2026
ce4ffae
Merge branch 'seqset-citations' of github.com:loculus-project/loculus…
tombch May 20, 2026
77e25dd
Make base dialog transition smoothly
tombch May 20, 2026
5e939b9
Prevent reloading seq set citations on modal open
tombch May 20, 2026
ed87cfc
Close connection in all cases
tombch May 20, 2026
4dde77f
Fix tests failing due to modal transitions
tombch May 20, 2026
9a56335
Revert "Make base dialog transition smoothly"
tombch May 20, 2026
64215e0
Revert "Fix tests failing due to modal transitions"
tombch May 20, 2026
2d8c1c5
Reverting dialog transitions
tombch May 20, 2026
9300db2
Moved year from string to int
tombch May 20, 2026
7b6d90c
Update schema documentation based on migration changes
actions-user May 20, 2026
4c1d82c
Updated schema to use integer primary key for citing sources - DOI is…
tombch May 20, 2026
112f49e
Merge branch 'seqset-citations' of github.com:loculus-project/loculus…
tombch May 20, 2026
c14e35d
Update schema documentation based on migration changes
actions-user May 20, 2026
cf19235
Fixing linting errors
tombch May 20, 2026
0b6d55f
Merge branch 'seqset-citations' of github.com:loculus-project/loculus…
tombch May 20, 2026
153cdac
Switch to using loculus date provider
tombch May 21, 2026
53e0eae
Merge branch 'main' into seqset-citations
tombch May 22, 2026
f35ab9d
Fix merge conflicts
tombch May 22, 2026
c686b17
Switch to customEnumeration to remove arbitrary length requirement
tombch May 22, 2026
04317b6
Updated crossref parsing function to throw on malformed xml and xml m…
tombch May 22, 2026
3ad141a
Restructuring backend types to have CitationSource, SeqSetCitationSou…
tombch May 26, 2026
8e0ff24
Update schema documentation based on migration changes
actions-user May 26, 2026
65229a7
Merge branch 'main' into seqset-citations
tombch May 26, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
102 changes: 100 additions & 2 deletions backend/docs/db/schema.sql
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@

\restrict dummy

-- Dumped from database version 15.17 (Debian 15.17-1.pgdg13+1)
-- Dumped by pg_dump version 16.13 (Debian 16.13-1.pgdg13+1)
-- Dumped from database version 15.18 (Debian 15.18-1.pgdg13+1)
-- Dumped by pg_dump version 16.14 (Debian 16.14-1.pgdg13+1)

SET statement_timeout = 0;
SET lock_timeout = 0;
Expand Down Expand Up @@ -311,6 +311,44 @@ CREATE TABLE public.metadata_upload_aux_table (

ALTER TABLE public.metadata_upload_aux_table OWNER TO postgres;

--
-- Name: seqset_citation_source; Type: TABLE; Schema: public; Owner: postgres
--

CREATE TABLE public.seqset_citation_source (
citation_source_id bigint NOT NULL,
source_doi text NOT NULL,
origin text NOT NULL,
title text NOT NULL,
year integer NOT NULL,
contributors jsonb NOT NULL,
CONSTRAINT seqset_citation_source_origin_check CHECK ((origin = ANY (ARRAY['CROSSREF'::text, 'CURATED'::text])))
);


ALTER TABLE public.seqset_citation_source OWNER TO postgres;

--
-- Name: seqset_citation_source_citation_source_id_seq; Type: SEQUENCE; Schema: public; Owner: postgres
--

CREATE SEQUENCE public.seqset_citation_source_citation_source_id_seq
START WITH 1
INCREMENT BY 1
NO MINVALUE
NO MAXVALUE
CACHE 1;


ALTER SEQUENCE public.seqset_citation_source_citation_source_id_seq OWNER TO postgres;

--
-- Name: seqset_citation_source_citation_source_id_seq; Type: SEQUENCE OWNED BY; Schema: public; Owner: postgres
--

ALTER SEQUENCE public.seqset_citation_source_citation_source_id_seq OWNED BY public.seqset_citation_source.citation_source_id;


--
-- Name: seqset_id_sequence; Type: SEQUENCE; Schema: public; Owner: postgres
--
Expand Down Expand Up @@ -360,6 +398,19 @@ ALTER SEQUENCE public.seqset_records_seqset_record_id_seq OWNER TO postgres;
ALTER SEQUENCE public.seqset_records_seqset_record_id_seq OWNED BY public.seqset_records.seqset_record_id;


--
-- Name: seqset_to_citation_source; Type: TABLE; Schema: public; Owner: postgres
--

CREATE TABLE public.seqset_to_citation_source (
citation_source_id bigint NOT NULL,
seqset_id text NOT NULL,
seqset_version bigint NOT NULL
);


ALTER TABLE public.seqset_to_citation_source OWNER TO postgres;

--
-- Name: seqset_to_records; Type: TABLE; Schema: public; Owner: postgres
--
Expand Down Expand Up @@ -581,6 +632,13 @@ ALTER TABLE ONLY public.audit_log ALTER COLUMN id SET DEFAULT nextval('public.au
ALTER TABLE ONLY public.groups_table ALTER COLUMN group_id SET DEFAULT nextval('public.groups_table_group_id_seq'::regclass);


--
-- Name: seqset_citation_source citation_source_id; Type: DEFAULT; Schema: public; Owner: postgres
--

ALTER TABLE ONLY public.seqset_citation_source ALTER COLUMN citation_source_id SET DEFAULT nextval('public.seqset_citation_source_citation_source_id_seq'::regclass);


--
-- Name: seqset_records seqset_record_id; Type: DEFAULT; Schema: public; Owner: postgres
--
Expand Down Expand Up @@ -682,6 +740,22 @@ ALTER TABLE ONLY public.metadata_upload_aux_table
ADD CONSTRAINT metadata_upload_aux_table_upload_id_accession_key UNIQUE (upload_id, accession);


--
-- Name: seqset_citation_source seqset_citation_source_pkey; Type: CONSTRAINT; Schema: public; Owner: postgres
--

ALTER TABLE ONLY public.seqset_citation_source
ADD CONSTRAINT seqset_citation_source_pkey PRIMARY KEY (citation_source_id);


--
-- Name: seqset_citation_source seqset_citation_source_source_doi_key; Type: CONSTRAINT; Schema: public; Owner: postgres
--

ALTER TABLE ONLY public.seqset_citation_source
ADD CONSTRAINT seqset_citation_source_source_doi_key UNIQUE (source_doi);


--
-- Name: seqset_records seqset_records_pkey; Type: CONSTRAINT; Schema: public; Owner: postgres
--
Expand All @@ -690,6 +764,14 @@ ALTER TABLE ONLY public.seqset_records
ADD CONSTRAINT seqset_records_pkey PRIMARY KEY (seqset_record_id);


--
-- Name: seqset_to_citation_source seqset_to_citation_source_pkey; Type: CONSTRAINT; Schema: public; Owner: postgres
--

ALTER TABLE ONLY public.seqset_to_citation_source
ADD CONSTRAINT seqset_to_citation_source_pkey PRIMARY KEY (citation_source_id, seqset_id, seqset_version);


--
-- Name: seqset_to_records seqset_to_records_pkey; Type: CONSTRAINT; Schema: public; Owner: postgres
--
Expand Down Expand Up @@ -881,6 +963,22 @@ ALTER TABLE ONLY public.files
ADD CONSTRAINT files_group_id_fkey FOREIGN KEY (group_id) REFERENCES public.groups_table(group_id);


--
-- Name: seqset_to_citation_source foreign_key_citation_source; Type: FK CONSTRAINT; Schema: public; Owner: postgres
--

ALTER TABLE ONLY public.seqset_to_citation_source
ADD CONSTRAINT foreign_key_citation_source FOREIGN KEY (citation_source_id) REFERENCES public.seqset_citation_source(citation_source_id) ON DELETE CASCADE;


--
-- Name: seqset_to_citation_source foreign_key_seqset; Type: FK CONSTRAINT; Schema: public; Owner: postgres
--

ALTER TABLE ONLY public.seqset_to_citation_source
ADD CONSTRAINT foreign_key_seqset FOREIGN KEY (seqset_id, seqset_version) REFERENCES public.seqsets(seqset_id, seqset_version) ON DELETE CASCADE;


--
-- Name: seqset_to_records foreign_key_seqset_id; Type: FK CONSTRAINT; Schema: public; Owner: postgres
--
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@ package org.loculus.backend.api
import io.swagger.v3.oas.annotations.media.Schema
import org.loculus.backend.utils.Accession
import java.sql.Timestamp
import java.util.*

data class SubmittedSeqSetRecord(
@Schema(
Expand Down Expand Up @@ -57,6 +56,24 @@ data class SeqSet(
val seqSetDOI: String?,
)

data class CitationContributor(val givenName: String, val surname: String)
Copy link
Copy Markdown
Contributor

@maverbiest maverbiest May 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to my comment below, to me the thing that people contribute to is the Publication (or CitingSource etc.), not to the act of citing. The citation is a connection between two publications, not really it's own entity

(of course, feel free to disagree if you think I'm bikeshedding)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I agree in the current layout this one should be CitingSourceContributor

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO now with CitationSource, this one can just stay as CitationContributor


enum class CitationOrigin {
CROSSREF,
CURATED,
}

data class CitationSource(
val sourceDOI: String,
val title: String,
val year: Int,
val contributors: List<CitationContributor>,
)

data class SeqSetCitationSource(val source: CitationSource, val seqSetDOIs: Set<String> = emptySet())

data class SeqSetCitation(val source: CitationSource)

data class ResponseSeqSet(val seqSetId: String, val seqSetVersion: Long)

data class CitedBy(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ import org.loculus.backend.api.AuthorProfile
import org.loculus.backend.api.CitedBy
import org.loculus.backend.api.ResponseSeqSet
import org.loculus.backend.api.SeqSet
import org.loculus.backend.api.SeqSetCitation
import org.loculus.backend.api.SeqSetRecord
import org.loculus.backend.api.SubmittedSeqSet
import org.loculus.backend.api.SubmittedSeqSetRecord
Expand Down Expand Up @@ -103,9 +104,9 @@ class SeqSetCitationsController(
submissionDatabaseService.getApprovedUserAccessionVersions(authenticatedUser),
)

@Operation(description = "Get count of SeqSet cited by publications")
@Operation(description = "Get SeqSet citations from publications")
@GetMapping("/get-seqset-cited-by-publication")
fun getSeqSetCitedByPublication(@RequestParam seqSetId: String, @RequestParam version: Long): CitedBy =
fun getSeqSetCitedByPublication(@RequestParam seqSetId: String, @RequestParam version: Long): List<SeqSetCitation> =
seqSetCitationsService.getSeqSetCitedByPublication(seqSetId, version)

@Operation(description = "Get an author")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,17 @@ package org.loculus.backend.service.crossref

import mu.KotlinLogging
import org.jsoup.Jsoup
import org.jsoup.parser.Parser
import org.loculus.backend.api.CitationContributor
import org.loculus.backend.api.CitationSource
import org.loculus.backend.api.SeqSetCitationSource
import org.loculus.backend.utils.DateProvider
import org.redundent.kotlin.xml.PrintOptions
import org.redundent.kotlin.xml.xml
import org.springframework.boot.context.properties.ConfigurationProperties
import org.springframework.stereotype.Service
import java.io.DataOutputStream
import java.io.IOException
import java.io.OutputStreamWriter
import java.io.PrintWriter
import java.net.HttpURLConnection
Expand Down Expand Up @@ -40,7 +46,7 @@ data class DoiEntry(
)

@Service
class CrossRefService(final val properties: CrossRefServiceProperties) {
class CrossRefService(private val properties: CrossRefServiceProperties, private val dateProvider: DateProvider) {
val isActive = properties.endpoint != null &&
properties.username != null &&
properties.password != null &&
Expand All @@ -49,16 +55,106 @@ class CrossRefService(final val properties: CrossRefServiceProperties) {
properties.email != null &&
properties.organization != null &&
properties.hostUrl != null
val dateTimeFormatterMM = DateTimeFormatter.ofPattern("MM")
val dateTimeFormatterdd = DateTimeFormatter.ofPattern("dd")
val dateTimeFormatteryyyy = DateTimeFormatter.ofPattern("yyyy")
val doiPrefix: String? = properties.doiPrefix
val dateTimeFormatterMM: DateTimeFormatter = DateTimeFormatter.ofPattern("MM")
val dateTimeFormatterdd: DateTimeFormatter = DateTimeFormatter.ofPattern("dd")
val dateTimeFormatteryyyy: DateTimeFormatter = DateTimeFormatter.ofPattern("yyyy")

private fun checkIsActive() {
if (!isActive) {
throw RuntimeException("The CrossRefService is not active as it has not been configured.")
}
}

fun parseCrossRefCitedByXML(citedByXML: String): List<SeqSetCitationSource> {
val parser = Parser.xmlParser().setTrackErrors(1)
val doc = Jsoup.parse(citedByXML, "", parser)

if (parser.errors.isNotEmpty()) {
throw IllegalStateException("Invalid XML: ${parser.errors}")
}

val crossRefResult = doc.children().firstOrNull()
if (crossRefResult?.tagName() != "crossref_result") {
throw IllegalStateException("Invalid CrossRef root element: ${crossRefResult?.tagName()}")
}

return crossRefResult.select("forward_link").map { forwardLink ->
val seqSetDOI = forwardLink.attr("doi").takeIf { it.isNotBlank() }
?: throw IllegalStateException("CrossRef forward_link missing SeqSet DOI: $forwardLink")

val citationElement =
forwardLink.children().firstOrNull()
?: throw IllegalStateException(
"CrossRef forward_link has no citation element under SeqSet $seqSetDOI: $forwardLink",
)

val sourceDOI = citationElement.selectFirst("doi")?.text()?.takeIf { it.isNotBlank() }
?: throw IllegalStateException(
"CrossRef citation source missing DOI for SeqSet $seqSetDOI: $citationElement",
)
val title = citationElement.selectFirst("title")?.text()?.takeIf { it.isNotBlank() }
?: throw IllegalStateException(
"CrossRef citation source missing title for SeqSet $seqSetDOI: $citationElement",
)
val year = citationElement.selectFirst("year")?.text()?.toIntOrNull()
?: throw IllegalStateException(
"CrossRef citation source missing or non-numeric year for SeqSet $seqSetDOI: $citationElement",
)
val contributors = citationElement.select("contributor").mapNotNull { c ->
val givenName = c.selectFirst("given_name")?.text().orEmpty()
val surname = c.selectFirst("surname")?.text().orEmpty()
if (givenName.isEmpty() && surname.isEmpty()) {
null
} else {
CitationContributor(givenName, surname)
}
}

SeqSetCitationSource(
source = CitationSource(
sourceDOI = sourceDOI,
title = title,
year = year,
contributors = contributors,
),
seqSetDOIs = setOf(seqSetDOI),
)
}
}

fun getCrossRefCitedBy(doiPrefix: String): List<SeqSetCitationSource> {
checkIsActive()

// End date is the current date at time of request
val endDate = dateProvider.getCurrentDate()
val connection = URI(
properties.endpoint +
"/servlet/getForwardLinks?usr=${properties.username}&pwd=${properties.password}&doi=$doiPrefix&endDate=$endDate&include_postedcontent=true",
Comment thread
tombch marked this conversation as resolved.
).toURL().openConnection() as HttpURLConnection
Comment thread
tombch marked this conversation as resolved.
Fixed
Comment on lines +132 to +134
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge URL-encode Crossref query params before building URI

This request string injects username, password, and doiPrefix directly into the query. If any of those values contains reserved characters (for example & or = in a password), the query gets corrupted and the Crossref call will fail or send wrong parameters, which stops citation ingestion. Build the URL from separately encoded query parameter values instead of raw string interpolation.

Useful? React with 👍 / 👎.

connection.connectTimeout = 10_000
connection.readTimeout = 30_000
connection.requestMethod = "GET"

val response = try {
Comment thread
tombch marked this conversation as resolved.
val responseCode = connection.responseCode
if (responseCode != HttpURLConnection.HTTP_OK) {
throw RuntimeException("CrossRef citedBy request returned $responseCode")
}
connection.inputStream.use { String(it.readAllBytes()) }
} catch (e: IOException) {
throw RuntimeException("CrossRef citedBy request failed for DOI $doiPrefix", e)
} finally {
connection.disconnect()
}

return try {
parseCrossRefCitedByXML(response)
} catch (e: Exception) {
throw RuntimeException("Failed to parse CrossRef citedBy response for DOI $doiPrefix", e)
}
}

fun generateCrossRefXML(entry: DoiEntry): String {
checkIsActive()

Expand Down
Loading
Loading