Skip to content

Live LML timeout: bump to 30s + synthesized-URL fallback on catch (closes #873)#971

Merged
jakebromberg merged 2 commits into
mainfrom
fix-873-lml-timeout-fallback
May 20, 2026
Merged

Live LML timeout: bump to 30s + synthesized-URL fallback on catch (closes #873)#971
jakebromberg merged 2 commits into
mainfrom
fix-873-lml-timeout-fallback

Conversation

@jakebromberg
Copy link
Copy Markdown
Member

Summary

Two changes that together close #873:

  • apps/backend/services/lml/lml.client.ts: TIMEOUT_MS 5s → 30s. Matches jobs/flowsheet-metadata-backfill/lml-fetch.ts. Runtime path is fire-and-forget so this holds a Node promise + LML socket, not a user-visible request.
  • apps/backend/services/metadata/enrichment.service.ts: catch arm now writes the three synthesized YouTube/Bandcamp/SoundCloud search URLs via SearchUrlProvider. Crucially, metadata_attempt_at stays unset on the failure path — the row remains eligible for the drift-repair sweep so the real artwork/Discogs/Spotify/Apple match can land on a future attempt. The WHERE clause narrows on metadata_attempt_at IS NULL to stay idempotent with the backfill.

Cold-cache LML lookups currently land at 8-18s (measured against prod 2026-05-19). Every miss aborts to LmlClientError, the catch arm Sentry-reports but writes nothing, and ~70% of an active show's rows surface as blank artwork on iOS until the 06:00 UTC drift-repair sweep catches up.

Measurements (2026-05-19, post-schema-fix)

Direct prod POST /api/v1/lookup:

Artist / Track wall search_type api_calls
22 Beaches / Breathing 8.0s compilation 5
El Keamo / Esperando 11.7s compilation 6
Rita Villa / Czardas 18.3s compilation 7
Sonido Dueñez / La Piragua 12.9s compilation 5
Amy Gadiaga / Imma Pick You Up 13.2s compilation 5
The Fly Girlz / Born 2 Be Fly 17.0s compilation 9
Felt / Evergreen Dazed (warm) 0.5s direct 0

Cold-cache compilation lookups dominate the wall-time — every one exceeds the existing 5s budget. The cascade depth fix lives upstream in WXYC/library-metadata-lookup#337 (A2/A3 already merged on main, not yet promoted to prod). This PR is the BS-side mitigation.

Test plan

  • Unit: on LML failure, row updated with the 3 synthesized URLs only (no artwork_url/discogs_url/spotify_url/apple_music_url/artist columns).
  • Unit: on LML failure, metadata_attempt_at NOT in the SET clause.
  • Unit: on LML failure, WHERE clause narrows on metadata_attempt_at IS NULL (idempotent with backfill).
  • Existing unit: on LML success-with-match, metadata_attempt_at stamped via NOW().
  • Existing unit: on LML success-no-match (search URLs only), metadata_attempt_at stamped.
  • Local: npm run test:unit — 1987/1987 pass.
  • Local: npm run lint — 0 errors.
  • Local: npm run typecheck — clean.
  • Local: npm run format:check — clean.
  • Post-merge prod verification (24h after deploy): re-query affected entries from Live LML metadata enrichment silently fails on most cold-cache lookups (5 s timeout, no catch-arm fallback) #873 (5210314, 5210316, 5210319, 5210327, 5210328, 5210336, 5210337, 5210340) and confirm artwork_url populated or at least the 3 search URLs present.

Related

Closes #873.

Cold-cache LML lookups now consistently land at 8-18s (measured against
prod 2026-05-19), exceeding the 5s runtime timeout. Every miss aborts to
LmlClientError, the catch arm Sentry-reports but writes nothing, and
~70% of an active show's rows surface as blank artwork on iOS until the
06:00 UTC drift-repair sweep catches up.

Two changes:

1. apps/backend/services/lml/lml.client.ts: TIMEOUT_MS 5s -> 30s,
   matching jobs/flowsheet-metadata-backfill/lml-fetch.ts. The runtime
   path is fire-and-forget after the HTTP response is sent, so the 30s
   budget holds a Node promise + LML socket, not a user-visible request.

2. apps/backend/services/metadata/enrichment.service.ts: the catch arm
   now writes the three synthesized YouTube/Bandcamp/SoundCloud search
   URLs via SearchUrlProvider so the listener has something to tap on
   even when LML throws. Critically, metadata_attempt_at stays unset on
   the failure path -- the row remains eligible for the drift-repair
   sweep so the real artwork/Discogs/Spotify/Apple match can land on a
   future attempt. The WHERE clause narrows on metadata_attempt_at IS
   NULL to stay idempotent with the backfill (mirrors the success
   path's contract).

Unit tests:
- On LML failure: writes the 3 search URLs, omits metadata_attempt_at,
  does not touch artwork/discogs/spotify/apple/artist columns.
- On LML failure: UPDATE narrows on metadata_attempt_at IS NULL.

Closes #873.
The metadata-lml integration test's "LML 500 error" case was asserting
the old catch-arm behavior (no DB write). Update it to match BS#873's
new contract: synthesized YouTube/Bandcamp/SoundCloud search URLs ARE
written to the row, while LML-dependent fields (artwork/discogs/spotify/
apple/artist) remain NULL. The retryability invariant
(`metadata_attempt_at IS NULL`) stays asserted at the SQL chunk level
by the unit tests — `flowsheet.controller.ts:40` omits the column from
the GET response shape.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Live LML metadata enrichment silently fails on most cold-cache lookups (5 s timeout, no catch-arm fallback)

1 participant