fix(search): support identifier aliases (cds, cdsrn, aleph, doi)#743
Conversation
6b493bd to
2ac540e
Compare
2ac540e to
6b20af8
Compare
palkerecsenyi
left a comment
There was a problem hiding this comment.
Looks perfect, just a quick question
6b20af8 to
fa32222
Compare
kpsherva
left a comment
There was a problem hiding this comment.
LGTM for the identifiers part! I am happy to merge it. (see there is a minor comment left)
What other fields would you propose to simplify?
| "inspire": "metadata.related_identifiers.identifier", | ||
| "cds": "metadata.identifiers.identifier", |
There was a problem hiding this comment.
both cds (legacy) and inspire identifiers' values are integers. How can we ensure that the query will not return both cds and inspire matching records when user searches for cds:12345?
There was a problem hiding this comment.
Yeah, this is actually something I tried to handle earlier with an AND clause to enforce both the scheme and the identifier value.
The idea was that something like cds:12345 should translate to “find an identifier where scheme = cds AND value = 12345”, so we don’t get cross-matches with other identifier types.
However, the issue was in how the transformer builds the query. The AND clause was effectively applied across the whole record instead of within the same identifier entry. So it behaved like:
“record has some identifier with scheme = cds AND record has some identifier with value = 12345”
instead of enforcing both conditions on the same identifier object.
Because of that, a record with cds:263303 and inspire:12345 could still match a query like inspire:263303, since the scheme and value conditions were satisfied by different identifiers.
So the issue wasn’t really with the idea of restricting by scheme, but with how the transformer applies those conditions. Right now the mapping only targets the value, so we don’t yet strictly guarantee scheme-level isolation.
fa32222 to
9d80d2d
Compare
9d80d2d to
4ea108f
Compare
Closes #703
This PR simplifies identifier search by introducing aliases (
identifier,cds,cdsrn,aleph,doi), allowing queries likecds:12345instead ofmetadata.identifiers.identifier:value.Aliases are mapped via
SearchFieldTransformerininvenio.cfg, and tests are added to verify correct query transformation.