Deploy wxyc_identity_match_* plpgsql functions (wiki §3.3.5)#38
Merged
jakebromberg merged 2 commits intoMay 11, 2026
Merged
Conversation
#37) Vendors the canonical artifacts from WXYC/wxyc-etl@v0.4.0 (`data/`) under `vendor/wxyc-etl/` and ships them as migration 0003. The migration sets up the `wxyc_unaccent` text-search dictionary, then inlines the canonical four-function SQL byte-for-byte so sqlx-cli can apply the whole deploy in one transaction. The parity test in `tests/wxyc_identity_match_parity_test.rs` covers three axes: SHA-pin freshness on every vendored file, migration-vs-canonical byte-equality after the wrapper prefix, and full 252-row fixture parity (plus an idempotence smoke). Local Postgres 18 passes all four assertions. Function deploy only — no cache-load flip in scope. This cache's `wxyc_library` hook normalizes via the Rust loader; the functions ship so cross-cache audit queries against this DB have a canonical identity-match form available. CI: the test-postgres job `docker cp`s the rules + version files into the postgres:16-alpine service container before the existing import + charset-torture steps, then runs the new parity test as an explicit `--ignored` step. Cargo: `wxyc-etl` 0.3.0 → 0.4.0.
This was referenced May 11, 2026
- Anchor migration-vs-canonical split on a `-- @begin CANONICAL BODY` sentinel emitted by the wrapper prelude, rather than `find()`ing the canonical's first line. Mirrors WXYC/musicbrainz-cache#52's fixup so the template stays consistent. - Add `migration_double_apply_is_a_no_op` — re-runs the migration end-to-end and verifies the function set still resolves. Pins the "re-applying is a no-op" contract. - Migration header: replace the muddled self-correcting parenthetical with a clear refresh procedure pointing at the sentinel, plus a documented caveat about the DROP+CREATE dictionary being safe today but coordinating with any future functional-index dependents.
jakebromberg
added a commit
to WXYC/discogs-etl
that referenced
this pull request
May 11, 2026
…ow fixes - Dockerfile: COPY alembic/ + vendor/ + wxyc-etl-pin.txt + alembic.ini into the runtime image. Migration 0004 reads vendor/wxyc-etl/wxyc_identity_match_functions.sql at apply time (single-source-of-truth design), so any in-container `alembic upgrade head` path (`docker compose up`, future containerized rebuilds) failed silently without these. The EC2-cron path that git-clones the repo still worked, but the Docker path didn't. - Lift alembic-upgrade to a module-scoped `migrated_db_url` fixture. The three PG-marked tests that just want a post-migration DB now share one upgrade invocation instead of running subprocess `alembic upgrade` three separate times — saves ~3-4s of wall time in the `pg` job. - Add `test_migration_double_apply_is_a_no_op` that re-applies 0004 end-to-end. Uses the function-scoped `db_url` directly so it exercises both applies itself (rather than chaining on top of the module fixture's already-applied state). Mirrors the same property pin landing in WXYC/musicbrainz-cache#52 + WXYC/wikidata-cache#38. - wxyc-etl-pin.txt: fix the stale filename reference `0004_wxyc_identity_match_functions.py` → `0004_wxyc_identity_match_fns.py` (the file was renamed to keep the revision id under alembic's 32-char VARCHAR limit).
4 tasks
jakebromberg
added a commit
to WXYC/discogs-etl
that referenced
this pull request
May 11, 2026
…ow fixes - Dockerfile: COPY alembic/ + vendor/ + wxyc-etl-pin.txt + alembic.ini into the runtime image. Migration 0004 reads vendor/wxyc-etl/wxyc_identity_match_functions.sql at apply time (single-source-of-truth design), so any in-container `alembic upgrade head` path (`docker compose up`, future containerized rebuilds) failed silently without these. The EC2-cron path that git-clones the repo still worked, but the Docker path didn't. - Lift alembic-upgrade to a module-scoped `migrated_db_url` fixture. The three PG-marked tests that just want a post-migration DB now share one upgrade invocation instead of running subprocess `alembic upgrade` three separate times — saves ~3-4s of wall time in the `pg` job. - Add `test_migration_double_apply_is_a_no_op` that re-applies 0004 end-to-end. Uses the function-scoped `db_url` directly so it exercises both applies itself (rather than chaining on top of the module fixture's already-applied state). Mirrors the same property pin landing in WXYC/musicbrainz-cache#52 + WXYC/wikidata-cache#38. - wxyc-etl-pin.txt: fix the stale filename reference `0004_wxyc_identity_match_functions.py` → `0004_wxyc_identity_match_fns.py` (the file was renamed to keep the revision id under alembic's 32-char VARCHAR limit).
Merged
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Mirrors WXYC/musicbrainz-cache#52 for wikidata-cache. Vendors canonical artifacts from WXYC/wxyc-etl@v0.4.0 under
vendor/wxyc-etl/and ships them as migration 0003.Same pattern as mb-cache: SHA-pinned
wxyc-etl-pin.txt, sqlx-cli-friendly wrapper migration that inlines the canonical SQL byte-for-byte after a dictionary-setup prelude, and a three-axis parity test (pin freshness, migration-vs-canonical, PG-side 252-row fixture + idempotence).Function deploy only — no cache-load flip. This cache's
wxyc_libraryhook normalizes via the Rust loader; the functions ship so cross-cache audit queries against this DB have a canonical identity-match form available.CI test-postgres job extended:
docker cprules into the alpine container, then runcargo test --test wxyc_identity_match_parity_test -- --ignored.Cargo:
wxyc-etl0.3.0 → 0.4.0.Closes #37.
Related: parent epic WXYC/wxyc-etl#73, prerequisite WXYC/wxyc-etl#113 (merged), sibling deploys WXYC/musicbrainz-cache#52, WXYC/Backend-Service#805, WXYC/discogs-etl#194.
Test plan
cargo fmt --check+cargo clippy --all-targets -- -D warningscargo test(full suite)TEST_DATABASE_URL=... cargo test --test wxyc_identity_match_parity_test -- --ignored— 4/4 pass on local PG 18docker cpstep lands the rules into the postgres:16-alpine service container)