Problem
After ingesting a batch of new flowsheet entries from tubafrenzy, jobs/flowsheet-etl/job.ts runs a post-import step that backfills flowsheet.dj_name for the just-inserted rows. The query shape is:
UPDATE wxyc_schema.flowsheet AS f
SET dj_name = COALESCE(u.dj_name, s.legacy_dj_name, u.name)
FROM wxyc_schema.shows AS s
LEFT JOIN auth_user AS u ON u.id = s.primary_dj_id
WHERE f.show_id = s.id
AND f.dj_name IS NULL
AND f.legacy_entry_id = ANY(($1, $2, $3, ..., $401))
When the batch is large (observed: 401 entries during the 2026-05-17 post-wedge backfill, ingesting tubafrenzy ids 2615446 → 2615846), the query fails. Inserts before this step committed fine; only the post-import dj_name backfill crashes, so data is intact but the new rows have dj_name = NULL until the next nightly run picks them up via flowsheet-metadata-backfill.
The crash shape — postgres-js gets a Failed query error with all 401 params dumped in the message — points at the ANY((...)) syntax. That parses as ANY(row_constructor), which expects exactly one row; the correct form for "match against any of these values" is f.legacy_entry_id = ANY($1::int[]) with a single array param, or f.legacy_entry_id IN ($1, $2, ...).
Where
jobs/flowsheet-etl/job.ts:76 — the resolveDjNames-style post-import UPDATE (search for the WHERE f.dj_name IS NULL AND f.legacy_entry_id = ANY(...) shape).
jobs/flowsheet-etl/job.ts:102 — uses ${legacyEntryIds} in a tagged template; that's the spot where drizzle constructs the ANY expression.
Reproducer
The 2026-05-17 manual ETL run (post-wedge) reproduces this. Any wedge >30 min where tubafrenzy receives >~50 entries will trigger it on recovery. Smaller batches (the normal 30-min cron) don't hit it because the batch size stays under whatever PG / postgres-js threshold is degrading.
Desired end state
Post-import dj_name UPDATE handles batches up to at least the size of a full wedge-recovery backfill (~1000 entries) without crashing.
Suggested approach
Switch from drizzle's auto-expanded ANY to an explicit array param:
// before
.where(and(
isNull(flowsheet.dj_name),
inArray(flowsheet.legacy_entry_id, legacyEntryIds), // or sql`ANY(${legacyEntryIds})`
))
// after — pass as a single int[] param
.where(and(
isNull(flowsheet.dj_name),
sql`${flowsheet.legacy_entry_id} = ANY(${legacyEntryIds}::int[])`,
))
Or batch the UPDATE itself: process the 401 entries in chunks of, say, 50, so the query parameter count stays small. The flowsheet-dj-name-backfill job (jobs/flowsheet-dj-name-backfill/) already does this batching well and is the pattern reference.
Acceptance
Related
project_bs_ec2_wedge_pattern (memory) — the 2026-05-17 wedge recovery is where this surfaced. The recovery recipe in that memory mentions the crash but doesn't fix it.
project_bulk_update_playbook (memory) / BS#934 — same WHERE shape, same per-row cost concerns; the playbook covers the bulk-UPDATE side but not the ANY-param-count side.
Problem
After ingesting a batch of new flowsheet entries from tubafrenzy,
jobs/flowsheet-etl/job.tsruns a post-import step that backfillsflowsheet.dj_namefor the just-inserted rows. The query shape is:When the batch is large (observed: 401 entries during the 2026-05-17 post-wedge backfill, ingesting tubafrenzy ids 2615446 → 2615846), the query fails. Inserts before this step committed fine; only the post-import dj_name backfill crashes, so data is intact but the new rows have
dj_name = NULLuntil the next nightly run picks them up viaflowsheet-metadata-backfill.The crash shape — postgres-js gets a
Failed queryerror with all 401 params dumped in the message — points at theANY((...))syntax. That parses asANY(row_constructor), which expects exactly one row; the correct form for "match against any of these values" isf.legacy_entry_id = ANY($1::int[])with a single array param, orf.legacy_entry_id IN ($1, $2, ...).Where
jobs/flowsheet-etl/job.ts:76— theresolveDjNames-style post-import UPDATE (search for theWHERE f.dj_name IS NULL AND f.legacy_entry_id = ANY(...)shape).jobs/flowsheet-etl/job.ts:102— uses${legacyEntryIds}in a tagged template; that's the spot where drizzle constructs the ANY expression.Reproducer
The 2026-05-17 manual ETL run (post-wedge) reproduces this. Any wedge >30 min where tubafrenzy receives >~50 entries will trigger it on recovery. Smaller batches (the normal 30-min cron) don't hit it because the batch size stays under whatever PG / postgres-js threshold is degrading.
Desired end state
Post-import dj_name UPDATE handles batches up to at least the size of a full wedge-recovery backfill (~1000 entries) without crashing.
Suggested approach
Switch from drizzle's auto-expanded ANY to an explicit array param:
Or batch the UPDATE itself: process the 401 entries in chunks of, say, 50, so the query parameter count stays small. The
flowsheet-dj-name-backfilljob (jobs/flowsheet-dj-name-backfill/) already does this batching well and is the pattern reference.Acceptance
feedback_check_upstream_direction(memory): we don't add a UNIQUE on legacy_entry_id, but we DO need to handle whatever batch sizes tubafrenzy's webhook backlog produces.Related
project_bs_ec2_wedge_pattern(memory) — the 2026-05-17 wedge recovery is where this surfaced. The recovery recipe in that memory mentions the crash but doesn't fix it.project_bulk_update_playbook(memory) / BS#934 — same WHERE shape, same per-row cost concerns; the playbook covers the bulk-UPDATE side but not the ANY-param-count side.