Live LML timeout: bump to 30s + synthesized-URL fallback on catch (closes #873)#971
Merged
Conversation
Cold-cache LML lookups now consistently land at 8-18s (measured against prod 2026-05-19), exceeding the 5s runtime timeout. Every miss aborts to LmlClientError, the catch arm Sentry-reports but writes nothing, and ~70% of an active show's rows surface as blank artwork on iOS until the 06:00 UTC drift-repair sweep catches up. Two changes: 1. apps/backend/services/lml/lml.client.ts: TIMEOUT_MS 5s -> 30s, matching jobs/flowsheet-metadata-backfill/lml-fetch.ts. The runtime path is fire-and-forget after the HTTP response is sent, so the 30s budget holds a Node promise + LML socket, not a user-visible request. 2. apps/backend/services/metadata/enrichment.service.ts: the catch arm now writes the three synthesized YouTube/Bandcamp/SoundCloud search URLs via SearchUrlProvider so the listener has something to tap on even when LML throws. Critically, metadata_attempt_at stays unset on the failure path -- the row remains eligible for the drift-repair sweep so the real artwork/Discogs/Spotify/Apple match can land on a future attempt. The WHERE clause narrows on metadata_attempt_at IS NULL to stay idempotent with the backfill (mirrors the success path's contract). Unit tests: - On LML failure: writes the 3 search URLs, omits metadata_attempt_at, does not touch artwork/discogs/spotify/apple/artist columns. - On LML failure: UPDATE narrows on metadata_attempt_at IS NULL. Closes #873.
The metadata-lml integration test's "LML 500 error" case was asserting the old catch-arm behavior (no DB write). Update it to match BS#873's new contract: synthesized YouTube/Bandcamp/SoundCloud search URLs ARE written to the row, while LML-dependent fields (artwork/discogs/spotify/ apple/artist) remain NULL. The retryability invariant (`metadata_attempt_at IS NULL`) stays asserted at the SQL chunk level by the unit tests — `flowsheet.controller.ts:40` omits the column from the GET response shape.
This was referenced May 20, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two changes that together close #873:
apps/backend/services/lml/lml.client.ts:TIMEOUT_MS5s → 30s. Matchesjobs/flowsheet-metadata-backfill/lml-fetch.ts. Runtime path is fire-and-forget so this holds a Node promise + LML socket, not a user-visible request.apps/backend/services/metadata/enrichment.service.ts: catch arm now writes the three synthesized YouTube/Bandcamp/SoundCloud search URLs viaSearchUrlProvider. Crucially,metadata_attempt_atstays unset on the failure path — the row remains eligible for the drift-repair sweep so the real artwork/Discogs/Spotify/Apple match can land on a future attempt. The WHERE clause narrows onmetadata_attempt_at IS NULLto stay idempotent with the backfill.Cold-cache LML lookups currently land at 8-18s (measured against prod 2026-05-19). Every miss aborts to
LmlClientError, the catch arm Sentry-reports but writes nothing, and ~70% of an active show's rows surface as blank artwork on iOS until the 06:00 UTC drift-repair sweep catches up.Measurements (2026-05-19, post-schema-fix)
Direct prod
POST /api/v1/lookup:Cold-cache compilation lookups dominate the wall-time — every one exceeds the existing 5s budget. The cascade depth fix lives upstream in WXYC/library-metadata-lookup#337 (A2/A3 already merged on
main, not yet promoted toprod). This PR is the BS-side mitigation.Test plan
metadata_attempt_atNOT in the SET clause.metadata_attempt_at IS NULL(idempotent with backfill).metadata_attempt_atstamped viaNOW().metadata_attempt_atstamped.npm run test:unit— 1987/1987 pass.npm run lint— 0 errors.npm run typecheck— clean.npm run format:check— clean.artwork_urlpopulated or at least the 3 search URLs present.Related
Closes #873.