Skip to content

Deploy wxyc_identity_match_* Postgres functions to Backend wxyc_schema (wiki §3.3.5) #805

@jakebromberg

Description

@jakebromberg

Status (post-2026-05-11)

SCOPE REVISED — out of scope as originally written. #806 (the function-deploy PR) surfaced two compounding blockers:

  1. AWS-managed RDS does not expose write access to $SHAREDIR/tsearch_data/, so the custom wxyc_unaccent dictionary the wiki §3.3.5 deploy pattern depends on cannot be installed on Backend regardless of PG version.
  2. Backend prod RDS is on Postgres 14.17, not 16 (the 2026-04-28 wiki verification was wrong — corrected in WXYC/wiki#56). The shipped function bodies have a current_setting('server_version_num') >= 160000 guard.

Per the cross-cache-identity pivot (BS#800), Backend's role is the thin writer for LML's bulk-resolve response — not a compute layer. Identity-match-form computation for library_identity* columns folds into the BS#663 step-2 backfill job (Python-side via the wxyc-etl PyPI wheel). The function-deploy half of this ticket is no longer scoped here.

Sibling deploys on self-managed PG instances are still in scope and shipping: WXYC/musicbrainz-cache#52, WXYC/wikidata-cache#38, WXYC/discogs-etl#195. LML's deferred-swap (LML#280) unblocks on WXYC/discogs-etl#195 alone — Backend is not on that critical path.

Action: closing this ticket once the BS#663 step-2 backfill design absorbs the column-write responsibility.


Original ticket (preserved for context)

Problem

Per wiki plans/library-hook-canonicalization.md §3.3.5, each cache + Backend deploys a Postgres analog of wxyc_etl::text::to_identity_match_form. This issue tracks the Backend deploy.

Per the §3.3.5 per-cache implementation-ownership table, Backend's deploy uses Drizzle (matching its existing migration tool) and lands at shared/database/src/functions/normalize.sql (called from a Drizzle migration via sql.raw()).

End state

Four plpgsql functions deployed to Backend's wxyc_schema, byte-identical to their Rust counterparts:

Postgres function Rust counterpart Notes
wxyc_identity_match_artist(text) to_identity_match_form Locked-on baseline (§3.3.2 steps 1-5 + 7)
wxyc_identity_match_title(text) to_identity_match_form_title Separate symbol; same body today
wxyc_identity_match_artist_with_punctuation(text) to_identity_match_form_with_punctuation Opt-in step 6
wxyc_identity_match_artist_with_disambiguator_strip(text) to_identity_match_form_with_disambiguator_strip Opt-in step 8 (artists only)

Functions are IMMUTABLE PARALLEL SAFE per §3.3.5.

Backend's column flip is downstream of this ticket. Per wiki §3.3.0 row 6 ("Cache re-normalization — ⏳ FUTURE — gated on §4 step 2 backfill PR | E2-BS step 2 backfill PR (epic #663)"), the library_identity* column re-normalization happens during the E2-BS step 2 backfill window, not in this ticket. Ship the functions now so they're available when the backfill PR runs.

Files

  • New: shared/database/src/functions/normalize.sql — single-source plpgsql vendored from wxyc-etl/data/wxyc_identity_match_functions.sql.
  • New: apps/backend/drizzle/<next>_wxyc_identity_match_functions.sql (or the equivalent Drizzle-generated migration filename) — applies normalize.sql via sql.raw().
  • New: wxyc-etl-pin.txt at repo root (or under shared/) — SHA-256 of the vendored rules file + canonical SQL.
  • New: an integration test under apps/backend/tests/integration/ that asserts byte-equality of wxyc_identity_match_artist(s) against a small fixture (canonical WXYC artist set). The full Rust↔Postgres byte-equality assertion lives in wxyc-etl; this repo's test is a thin sanity-check that the functions actually deployed.
  • Modified: relevant CLAUDE.md section ("Cross-cache-identity feature flags" or similar) — one bullet noting the function family.

Acceptance

  • Drizzle migration applies cleanly; re-application is a no-op (CREATE OR REPLACE).
  • Sanity-check integration test passes against the PG service container.
  • wxyc-etl-pin.txt SHA matches.
  • Lint/typecheck/test all green per the existing CI matrix.

Constraints

  • Postgres 16+ minimum per §3.3.5 (Backend prod RDS verified at Postgres 16 on 2026-04-28).
  • Drizzle's migration tool doesn't natively support IMMUTABLE PARALLEL SAFE function declarations — go through sql.raw() per the existing pattern.
  • Don't flip library_identity* column normalization in this ticket. That's gated on the E2-BS step 2 backfill PR (Backend-Service#663). This ticket just ships the function definitions so the backfill window has them available.

Related

Out of scope

Metadata

Metadata

Assignees

No one assigned

    Labels

    cross-cache-identityProject tag for the cross-cache-identity initiative (library hook + identity record + normalization)status:blockedCannot start until a dependency closes

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    In Progress

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions