Skip to content

fix(flowsheet-metadata-backfill): send body.song to LML, not body.track#916

Merged
jakebromberg merged 3 commits into
mainfrom
fix/888-backfill-body-song
May 19, 2026
Merged

fix(flowsheet-metadata-backfill): send body.song to LML, not body.track#916
jakebromberg merged 3 commits into
mainfrom
fix/888-backfill-body-song

Conversation

@jakebromberg
Copy link
Copy Markdown
Member

@jakebromberg jakebromberg commented May 15, 2026

Summary

  • jobs/flowsheet-metadata-backfill/lml-fetch.ts was POSTing body.track to LML's /api/v1/lookup. LookupRequest in wxyc-shared/api.yaml names the field song; FastAPI/Pydantic silently drops unknown keys, so the nightly drain has been running as artist+album-only since the job shipped on 2026-04-29.
  • Switch the wire-format key from tracksong (one character). The local var stays track because it maps to flowsheet.track_title in the row.
  • Add a regression test that asserts the body field name, so the silent drop can't recur after B4's client consolidation.
  • Add docs/backfill-song-vs-track-impact.md quantifying the bug's effect on prod data and recommending a re-run of the buggy window.

Impact (full analysis in the doc)

n=1000 sample from the buggy window (841,049 eligible rows), compared LML top-1 release_id under buggy vs. fixed wire shapes:

Class Count % of bucketed (917)
same_release 533 58.1%
diff_release (wrong release) 76 8.3%
buggy_null_fixed_found (missed) 90 9.8%
buggy_found_fixed_null (rare) 3 0.3%
both_null 215 23.4%
Divergent total 169 18.4%

Spot-check of diff_release cases shows the song-aware path is unambiguously better: e.g. the buggy backfill resolved "Little Brother / the chitlin circuit / what you do" to a Skillz release; the fixed shape returns "The Chittlin Circuit 1.5" by Little Brother. The bN_fY bucket caught compilation hits the album-only path can't reach (Missa Luba V/A, "Eccentric Soul" comps, etc.).

Extrapolating: ~152k rows carry materially worse artwork today than they should. 2.8k rows would marginally regress under a naive re-run — a top-1 extraction quirk in extractArtwork around LML's search_type: compilation shape, filed as a separate follow-up.

Recommendation

Land this fix, deploy, then NULL metadata_attempt_at on the buggy-window subset and let the existing cron drain it (procedure in the new doc).

Test plan

  • npm run typecheck
  • npm run lint (0 errors)
  • npm run format:check
  • npm run test:unit (1857/1857 pass, including 2 new regression tests)
  • CI green (lint+typecheck, unit, integration)
  • Impact-quantification sample run (n=1000) against prod LML — results in the new doc
  • After deploy: kick the re-run on the buggy window

Closes #888.

jakebromberg and others added 3 commits May 15, 2026 07:33
…ody.track

The historical drain has been calling LML's /api/v1/lookup with `body.track`
since 2026-04-29, but LookupRequest in wxyc-shared/api.yaml names the field
`song`. FastAPI/Pydantic silently drops unknown keys, so every backfill row
was processed as an artist+album-only query: parsed.song was always false on
LML's side, gating off the TRACK_ON_COMPILATION, SONG_AS_TRACK, and
SONG_AS_ARTIST strategies. The runtime path at lml.client.ts:159 already
sends `song` — only the job's duplicated wrapper drifted.

Pin the wire shape with a regression test that fails if either field name
flips, so the silent drop can't recur after the B4 client consolidation.

Refs #888.
…888

n=1000 random sample from the buggy window (2026-04-29 → 2026-05-15)
compared LML top-1 release_id under the buggy and fixed wire shapes.

Headline: 16.9% divergence (169/1000), or 18.4% of bucketed pairs after
excluding errored comparisons. Extrapolated to the 841,049-row window:
~152,200 rows currently carry materially worse artwork than they should
(~69,800 wrong-album, ~82,400 no-artwork-when-it-would-resolve). 2,800
rows would marginally regress under a re-run due to a top-1 extraction
quirk in extractArtwork around compilation responses — filed as a
follow-up.

Recommendation: re-run the buggy window after the fix deploys, using
the bulk-UPDATE playbook to NULL metadata_attempt_at on the targeted
subset and let the existing cron drain it.

Refs #888.
@jakebromberg jakebromberg merged commit 59a5c12 into main May 19, 2026
5 checks passed
@jakebromberg jakebromberg deleted the fix/888-backfill-body-song branch May 19, 2026 18:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[B5] Silent bug: flowsheet-metadata-backfill sends body.track instead of body.song

1 participant