Status (post-2026-05-11)
SCOPE REVISED — out of scope as originally written. #806 (the function-deploy PR) surfaced two compounding blockers:
- AWS-managed RDS does not expose write access to
$SHAREDIR/tsearch_data/, so the custom wxyc_unaccent dictionary the wiki §3.3.5 deploy pattern depends on cannot be installed on Backend regardless of PG version.
- Backend prod RDS is on Postgres 14.17, not 16 (the 2026-04-28 wiki verification was wrong — corrected in WXYC/wiki#56). The shipped function bodies have a
current_setting('server_version_num') >= 160000 guard.
Per the cross-cache-identity pivot (BS#800), Backend's role is the thin writer for LML's bulk-resolve response — not a compute layer. Identity-match-form computation for library_identity* columns folds into the BS#663 step-2 backfill job (Python-side via the wxyc-etl PyPI wheel). The function-deploy half of this ticket is no longer scoped here.
Sibling deploys on self-managed PG instances are still in scope and shipping: WXYC/musicbrainz-cache#52, WXYC/wikidata-cache#38, WXYC/discogs-etl#195. LML's deferred-swap (LML#280) unblocks on WXYC/discogs-etl#195 alone — Backend is not on that critical path.
Action: closing this ticket once the BS#663 step-2 backfill design absorbs the column-write responsibility.
Original ticket (preserved for context)
Problem
Per wiki plans/library-hook-canonicalization.md §3.3.5, each cache + Backend deploys a Postgres analog of wxyc_etl::text::to_identity_match_form. This issue tracks the Backend deploy.
Per the §3.3.5 per-cache implementation-ownership table, Backend's deploy uses Drizzle (matching its existing migration tool) and lands at shared/database/src/functions/normalize.sql (called from a Drizzle migration via sql.raw()).
End state
Four plpgsql functions deployed to Backend's wxyc_schema, byte-identical to their Rust counterparts:
| Postgres function |
Rust counterpart |
Notes |
wxyc_identity_match_artist(text) |
to_identity_match_form |
Locked-on baseline (§3.3.2 steps 1-5 + 7) |
wxyc_identity_match_title(text) |
to_identity_match_form_title |
Separate symbol; same body today |
wxyc_identity_match_artist_with_punctuation(text) |
to_identity_match_form_with_punctuation |
Opt-in step 6 |
wxyc_identity_match_artist_with_disambiguator_strip(text) |
to_identity_match_form_with_disambiguator_strip |
Opt-in step 8 (artists only) |
Functions are IMMUTABLE PARALLEL SAFE per §3.3.5.
Backend's column flip is downstream of this ticket. Per wiki §3.3.0 row 6 ("Cache re-normalization — ⏳ FUTURE — gated on §4 step 2 backfill PR | E2-BS step 2 backfill PR (epic #663)"), the library_identity* column re-normalization happens during the E2-BS step 2 backfill window, not in this ticket. Ship the functions now so they're available when the backfill PR runs.
Files
- New:
shared/database/src/functions/normalize.sql — single-source plpgsql vendored from wxyc-etl/data/wxyc_identity_match_functions.sql.
- New:
apps/backend/drizzle/<next>_wxyc_identity_match_functions.sql (or the equivalent Drizzle-generated migration filename) — applies normalize.sql via sql.raw().
- New:
wxyc-etl-pin.txt at repo root (or under shared/) — SHA-256 of the vendored rules file + canonical SQL.
- New: an integration test under
apps/backend/tests/integration/ that asserts byte-equality of wxyc_identity_match_artist(s) against a small fixture (canonical WXYC artist set). The full Rust↔Postgres byte-equality assertion lives in wxyc-etl; this repo's test is a thin sanity-check that the functions actually deployed.
- Modified: relevant CLAUDE.md section ("Cross-cache-identity feature flags" or similar) — one bullet noting the function family.
Acceptance
Constraints
- Postgres 16+ minimum per §3.3.5 (Backend prod RDS verified at Postgres 16 on 2026-04-28).
- Drizzle's migration tool doesn't natively support
IMMUTABLE PARALLEL SAFE function declarations — go through sql.raw() per the existing pattern.
- Don't flip
library_identity* column normalization in this ticket. That's gated on the E2-BS step 2 backfill PR (Backend-Service#663). This ticket just ships the function definitions so the backfill window has them available.
Related
Out of scope
Status (post-2026-05-11)
SCOPE REVISED — out of scope as originally written. #806 (the function-deploy PR) surfaced two compounding blockers:
$SHAREDIR/tsearch_data/, so the customwxyc_unaccentdictionary the wiki §3.3.5 deploy pattern depends on cannot be installed on Backend regardless of PG version.current_setting('server_version_num') >= 160000guard.Per the cross-cache-identity pivot (
BS#800), Backend's role is the thin writer for LML's bulk-resolve response — not a compute layer. Identity-match-form computation forlibrary_identity*columns folds into theBS#663step-2 backfill job (Python-side via thewxyc-etlPyPI wheel). The function-deploy half of this ticket is no longer scoped here.Sibling deploys on self-managed PG instances are still in scope and shipping: WXYC/musicbrainz-cache#52, WXYC/wikidata-cache#38, WXYC/discogs-etl#195. LML's deferred-swap (
LML#280) unblocks on WXYC/discogs-etl#195 alone — Backend is not on that critical path.Action: closing this ticket once the BS#663 step-2 backfill design absorbs the column-write responsibility.
Original ticket (preserved for context)
Problem
Per wiki
plans/library-hook-canonicalization.md§3.3.5, each cache + Backend deploys a Postgres analog ofwxyc_etl::text::to_identity_match_form. This issue tracks the Backend deploy.Per the §3.3.5 per-cache implementation-ownership table, Backend's deploy uses Drizzle (matching its existing migration tool) and lands at
shared/database/src/functions/normalize.sql(called from a Drizzle migration viasql.raw()).End state
Four plpgsql functions deployed to Backend's
wxyc_schema, byte-identical to their Rust counterparts:wxyc_identity_match_artist(text)to_identity_match_formwxyc_identity_match_title(text)to_identity_match_form_titlewxyc_identity_match_artist_with_punctuation(text)to_identity_match_form_with_punctuationwxyc_identity_match_artist_with_disambiguator_strip(text)to_identity_match_form_with_disambiguator_stripFunctions are
IMMUTABLE PARALLEL SAFEper §3.3.5.Backend's column flip is downstream of this ticket. Per wiki §3.3.0 row 6 ("Cache re-normalization — ⏳ FUTURE — gated on §4 step 2 backfill PR | E2-BS step 2 backfill PR (epic #663)"), the
library_identity*column re-normalization happens during the E2-BS step 2 backfill window, not in this ticket. Ship the functions now so they're available when the backfill PR runs.Files
shared/database/src/functions/normalize.sql— single-source plpgsql vendored fromwxyc-etl/data/wxyc_identity_match_functions.sql.apps/backend/drizzle/<next>_wxyc_identity_match_functions.sql(or the equivalent Drizzle-generated migration filename) — appliesnormalize.sqlviasql.raw().wxyc-etl-pin.txtat repo root (or undershared/) — SHA-256 of the vendored rules file + canonical SQL.apps/backend/tests/integration/that asserts byte-equality ofwxyc_identity_match_artist(s)against a small fixture (canonical WXYC artist set). The full Rust↔Postgres byte-equality assertion lives in wxyc-etl; this repo's test is a thin sanity-check that the functions actually deployed.Acceptance
CREATE OR REPLACE).wxyc-etl-pin.txtSHA matches.Constraints
IMMUTABLE PARALLEL SAFEfunction declarations — go throughsql.raw()per the existing pattern.library_identity*column normalization in this ticket. That's gated on the E2-BS step 2 backfill PR (Backend-Service#663). This ticket just ships the function definitions so the backfill window has them available.Related
library_identity_source.norm_*etc. happens during the E2-BS step 2 backfill PR (epic [Epic] E2 — Canonical identity record (library_identity) + LML write contract #663). Cite this ticket from that PR when it runs.plans/library-hook-canonicalization.md§3.3.5.Out of scope
library_identity_sourcecolumn flip (downstream of [Epic] E2 — Canonical identity record (library_identity) + LML write contract #663 step 2).