Skip to content

Deploy wxyc_identity_match_* plpgsql functions (wiki §3.3.5)#38

Merged
jakebromberg merged 2 commits into
mainfrom
cross-cache-identity/identity-match-functions
May 11, 2026
Merged

Deploy wxyc_identity_match_* plpgsql functions (wiki §3.3.5)#38
jakebromberg merged 2 commits into
mainfrom
cross-cache-identity/identity-match-functions

Conversation

@jakebromberg
Copy link
Copy Markdown
Member

Summary

Mirrors WXYC/musicbrainz-cache#52 for wikidata-cache. Vendors canonical artifacts from WXYC/wxyc-etl@v0.4.0 under vendor/wxyc-etl/ and ships them as migration 0003.

Same pattern as mb-cache: SHA-pinned wxyc-etl-pin.txt, sqlx-cli-friendly wrapper migration that inlines the canonical SQL byte-for-byte after a dictionary-setup prelude, and a three-axis parity test (pin freshness, migration-vs-canonical, PG-side 252-row fixture + idempotence).

Function deploy only — no cache-load flip. This cache's wxyc_library hook normalizes via the Rust loader; the functions ship so cross-cache audit queries against this DB have a canonical identity-match form available.

CI test-postgres job extended: docker cp rules into the alpine container, then run cargo test --test wxyc_identity_match_parity_test -- --ignored.

Cargo: wxyc-etl 0.3.0 → 0.4.0.

Closes #37.

Related: parent epic WXYC/wxyc-etl#73, prerequisite WXYC/wxyc-etl#113 (merged), sibling deploys WXYC/musicbrainz-cache#52, WXYC/Backend-Service#805, WXYC/discogs-etl#194.

Test plan

  • cargo fmt --check + cargo clippy --all-targets -- -D warnings
  • cargo test (full suite)
  • TEST_DATABASE_URL=... cargo test --test wxyc_identity_match_parity_test -- --ignored — 4/4 pass on local PG 18
  • CI green (verify docker cp step lands the rules into the postgres:16-alpine service container)

 #37)

Vendors the canonical artifacts from WXYC/wxyc-etl@v0.4.0 (`data/`) under `vendor/wxyc-etl/` and ships them as migration 0003. The migration sets up the `wxyc_unaccent` text-search dictionary, then inlines the canonical four-function SQL byte-for-byte so sqlx-cli can apply the whole deploy in one transaction.

The parity test in `tests/wxyc_identity_match_parity_test.rs` covers three axes: SHA-pin freshness on every vendored file, migration-vs-canonical byte-equality after the wrapper prefix, and full 252-row fixture parity (plus an idempotence smoke). Local Postgres 18 passes all four assertions.

Function deploy only — no cache-load flip in scope. This cache's `wxyc_library` hook normalizes via the Rust loader; the functions ship so cross-cache audit queries against this DB have a canonical identity-match form available.

CI: the test-postgres job `docker cp`s the rules + version files into the postgres:16-alpine service container before the existing import + charset-torture steps, then runs the new parity test as an explicit `--ignored` step.

Cargo: `wxyc-etl` 0.3.0 → 0.4.0.
- Anchor migration-vs-canonical split on a `-- @begin CANONICAL BODY` sentinel emitted by the wrapper prelude, rather than `find()`ing the canonical's first line. Mirrors WXYC/musicbrainz-cache#52's fixup so the template stays consistent.
- Add `migration_double_apply_is_a_no_op` — re-runs the migration end-to-end and verifies the function set still resolves. Pins the "re-applying is a no-op" contract.
- Migration header: replace the muddled self-correcting parenthetical with a clear refresh procedure pointing at the sentinel, plus a documented caveat about the DROP+CREATE dictionary being safe today but coordinating with any future functional-index dependents.
jakebromberg added a commit to WXYC/discogs-etl that referenced this pull request May 11, 2026
…ow fixes

- Dockerfile: COPY alembic/ + vendor/ + wxyc-etl-pin.txt + alembic.ini into the runtime image. Migration 0004 reads vendor/wxyc-etl/wxyc_identity_match_functions.sql at apply time (single-source-of-truth design), so any in-container `alembic upgrade head` path (`docker compose up`, future containerized rebuilds) failed silently without these. The EC2-cron path that git-clones the repo still worked, but the Docker path didn't.
- Lift alembic-upgrade to a module-scoped `migrated_db_url` fixture. The three PG-marked tests that just want a post-migration DB now share one upgrade invocation instead of running subprocess `alembic upgrade` three separate times — saves ~3-4s of wall time in the `pg` job.
- Add `test_migration_double_apply_is_a_no_op` that re-applies 0004 end-to-end. Uses the function-scoped `db_url` directly so it exercises both applies itself (rather than chaining on top of the module fixture's already-applied state). Mirrors the same property pin landing in WXYC/musicbrainz-cache#52 + WXYC/wikidata-cache#38.
- wxyc-etl-pin.txt: fix the stale filename reference `0004_wxyc_identity_match_functions.py` → `0004_wxyc_identity_match_fns.py` (the file was renamed to keep the revision id under alembic's 32-char VARCHAR limit).
@jakebromberg jakebromberg merged commit a00f4de into main May 11, 2026
3 checks passed
@jakebromberg jakebromberg deleted the cross-cache-identity/identity-match-functions branch May 11, 2026 18:00
jakebromberg added a commit to WXYC/discogs-etl that referenced this pull request May 11, 2026
…ow fixes

- Dockerfile: COPY alembic/ + vendor/ + wxyc-etl-pin.txt + alembic.ini into the runtime image. Migration 0004 reads vendor/wxyc-etl/wxyc_identity_match_functions.sql at apply time (single-source-of-truth design), so any in-container `alembic upgrade head` path (`docker compose up`, future containerized rebuilds) failed silently without these. The EC2-cron path that git-clones the repo still worked, but the Docker path didn't.
- Lift alembic-upgrade to a module-scoped `migrated_db_url` fixture. The three PG-marked tests that just want a post-migration DB now share one upgrade invocation instead of running subprocess `alembic upgrade` three separate times — saves ~3-4s of wall time in the `pg` job.
- Add `test_migration_double_apply_is_a_no_op` that re-applies 0004 end-to-end. Uses the function-scoped `db_url` directly so it exercises both applies itself (rather than chaining on top of the module fixture's already-applied state). Mirrors the same property pin landing in WXYC/musicbrainz-cache#52 + WXYC/wikidata-cache#38.
- wxyc-etl-pin.txt: fix the stale filename reference `0004_wxyc_identity_match_functions.py` → `0004_wxyc_identity_match_fns.py` (the file was renamed to keep the revision id under alembic's 32-char VARCHAR limit).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Deploy wxyc_identity_match_* Postgres functions (wiki §3.3.5)

1 participant