fix(flowsheet-metadata-backfill): send body.song to LML, not body.track#916
Merged
Conversation
…ody.track The historical drain has been calling LML's /api/v1/lookup with `body.track` since 2026-04-29, but LookupRequest in wxyc-shared/api.yaml names the field `song`. FastAPI/Pydantic silently drops unknown keys, so every backfill row was processed as an artist+album-only query: parsed.song was always false on LML's side, gating off the TRACK_ON_COMPILATION, SONG_AS_TRACK, and SONG_AS_ARTIST strategies. The runtime path at lml.client.ts:159 already sends `song` — only the job's duplicated wrapper drifted. Pin the wire shape with a regression test that fails if either field name flips, so the silent drop can't recur after the B4 client consolidation. Refs #888.
…888 n=1000 random sample from the buggy window (2026-04-29 → 2026-05-15) compared LML top-1 release_id under the buggy and fixed wire shapes. Headline: 16.9% divergence (169/1000), or 18.4% of bucketed pairs after excluding errored comparisons. Extrapolated to the 841,049-row window: ~152,200 rows currently carry materially worse artwork than they should (~69,800 wrong-album, ~82,400 no-artwork-when-it-would-resolve). 2,800 rows would marginally regress under a re-run due to a top-1 extraction quirk in extractArtwork around compilation responses — filed as a follow-up. Recommendation: re-run the buggy window after the fix deploys, using the bulk-UPDATE playbook to NULL metadata_attempt_at on the targeted subset and let the existing cron drain it. Refs #888.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
jobs/flowsheet-metadata-backfill/lml-fetch.tswas POSTingbody.trackto LML's/api/v1/lookup.LookupRequestinwxyc-shared/api.yamlnames the fieldsong; FastAPI/Pydantic silently drops unknown keys, so the nightly drain has been running as artist+album-only since the job shipped on 2026-04-29.track→song(one character). The local var staystrackbecause it maps toflowsheet.track_titlein the row.docs/backfill-song-vs-track-impact.mdquantifying the bug's effect on prod data and recommending a re-run of the buggy window.Impact (full analysis in the doc)
n=1000 sample from the buggy window (841,049 eligible rows), compared LML top-1
release_idunder buggy vs. fixed wire shapes:same_releasediff_release(wrong release)buggy_null_fixed_found(missed)buggy_found_fixed_null(rare)both_nullSpot-check of
diff_releasecases shows the song-aware path is unambiguously better: e.g. the buggy backfill resolved "Little Brother / the chitlin circuit / what you do" to a Skillz release; the fixed shape returns "The Chittlin Circuit 1.5" by Little Brother. The bN_fY bucket caught compilation hits the album-only path can't reach (Missa Luba V/A, "Eccentric Soul" comps, etc.).Extrapolating: ~152k rows carry materially worse artwork today than they should. 2.8k rows would marginally regress under a naive re-run — a top-1 extraction quirk in
extractArtworkaround LML'ssearch_type: compilationshape, filed as a separate follow-up.Recommendation
Land this fix, deploy, then NULL
metadata_attempt_aton the buggy-window subset and let the existing cron drain it (procedure in the new doc).Test plan
npm run typechecknpm run lint(0 errors)npm run format:checknpm run test:unit(1857/1857 pass, including 2 new regression tests)Closes #888.