Skip to content

Backfill library.artwork_url via LML (precondition for #643) #647

@jakebromberg

Description

@jakebromberg

Context

Precondition for #643 (and for fix surface 1 of #628). The bin-pick branch of addEntry denormalizes library.artwork_url onto the new flowsheet row at INSERT time, eliminating the metadata race for bin-picks across all client surfaces — but only for library rows that have artwork_url populated.

Per the precondition comment on #628 (issuecomment-4338014422), the production query quoted there reports that the current population is 155/64163 rows (0.2%). So #643's denormalization SELECT is shipped (held in draft) but does ≈nothing in practice today. This issue tracks the backfill that gives it teeth.

Proposal

A one-shot backfill job under jobs/library-artwork-backfill/, modeled after jobs/flowsheet-dj-name-backfill/ (the canonical pattern; see Backend-Service CLAUDE.md for the rule that bulk DML is always a separate job, never inside a migration).

Approach:

  1. For each library row with artwork_url IS NULL and a non-null artist_id + album_title, call LML's /lookup endpoint (same call path as metadata.service.ts:fetchMetadata).
  2. On a single high-confidence match, set library.artwork_url = artwork.artwork_url.
  3. Batched UPDATEs with BACKFILL_BATCH_SIZE (default 5000), synchronous_commit=off per the bulk-update playbook, idempotent (WHERE artwork_url IS NULL filter resumes naturally).
  4. Phase A observability tags (repo, tool=library-artwork-backfill, step, run_id).
  5. Cooperate with LML rate limits — the job dwarfs LML's request volume from the live insert path, so will need throttling.

Out of scope

  • Streaming-link backfill (spotify_url, apple_music_url, etc. on library). The current schema (shared/database/src/schema.ts:280-290) only has artwork_url on library; the other LML metadata fields land on flowsheet per the inline-metadata model. If we want to denormalize more fields onto library, that's a schema change worth a separate proposal.
  • Backfilling flowsheet.artwork_url from history (covered by [Epic] Historical metadata backfill for ~1.86M flowsheet rows #631).

Acceptance criteria

Cross-repo links

Metadata

Metadata

Assignees

No one assigned

    Labels

    lmlTouches library-metadata-lookup

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions