flowsheet-etl: post-import dj_name UPDATE crashes on large legacy_entry_id batches (400+ param ANY clause)

## Problem

After ingesting a batch of new flowsheet entries from tubafrenzy, `jobs/flowsheet-etl/job.ts` runs a post-import step that backfills `flowsheet.dj_name` for the just-inserted rows. The query shape is:

```sql
UPDATE wxyc_schema.flowsheet AS f
SET dj_name = COALESCE(u.dj_name, s.legacy_dj_name, u.name)
FROM wxyc_schema.shows AS s
LEFT JOIN auth_user AS u ON u.id = s.primary_dj_id
WHERE f.show_id = s.id
  AND f.dj_name IS NULL
  AND f.legacy_entry_id = ANY(($1, $2, $3, ..., $401))
```

When the batch is large (observed: 401 entries during the 2026-05-17 post-wedge backfill, ingesting tubafrenzy ids 2615446 → 2615846), the query fails. Inserts before this step committed fine; only the post-import dj_name backfill crashes, so data is intact but the new rows have `dj_name = NULL` until the next nightly run picks them up via `flowsheet-metadata-backfill`.

The crash shape — postgres-js gets a `Failed query` error with all 401 params dumped in the message — points at the `ANY((...))` syntax. That parses as `ANY(row_constructor)`, which expects exactly one row; the correct form for "match against any of these values" is `f.legacy_entry_id = ANY($1::int[])` with a single array param, or `f.legacy_entry_id IN ($1, $2, ...)`.

## Where

- `jobs/flowsheet-etl/job.ts:76` — the `resolveDjNames`-style post-import UPDATE (search for the `WHERE f.dj_name IS NULL AND f.legacy_entry_id = ANY(...)` shape).
- `jobs/flowsheet-etl/job.ts:102` — uses `${legacyEntryIds}` in a tagged template; that's the spot where drizzle constructs the ANY expression.

## Reproducer

The 2026-05-17 manual ETL run (post-wedge) reproduces this. Any wedge >30 min where tubafrenzy receives >~50 entries will trigger it on recovery. Smaller batches (the normal 30-min cron) don't hit it because the batch size stays under whatever PG / postgres-js threshold is degrading.

## Desired end state

Post-import dj_name UPDATE handles batches up to at least the size of a full wedge-recovery backfill (~1000 entries) without crashing.

## Suggested approach

Switch from drizzle's auto-expanded ANY to an explicit array param:

```ts
// before
.where(and(
  isNull(flowsheet.dj_name),
  inArray(flowsheet.legacy_entry_id, legacyEntryIds),  // or sql`ANY(${legacyEntryIds})`
))

// after — pass as a single int[] param
.where(and(
  isNull(flowsheet.dj_name),
  sql`${flowsheet.legacy_entry_id} = ANY(${legacyEntryIds}::int[])`,
))
```

Or batch the UPDATE itself: process the 401 entries in chunks of, say, 50, so the query parameter count stays small. The `flowsheet-dj-name-backfill` job (`jobs/flowsheet-dj-name-backfill/`) already does this batching well and is the pattern reference.

## Acceptance

- [ ] Manual run of flowsheet-etl over a 500+ entry tubafrenzy backlog completes without crashing on the dj_name UPDATE.
- [ ] Unit test that exercises the ANY clause with a large legacy_entry_id list (≥500 ids).
- [ ] Inline comment cross-references `feedback_check_upstream_direction` (memory): we don't add a UNIQUE on legacy_entry_id, but we DO need to handle whatever batch sizes tubafrenzy's webhook backlog produces.

## Related

- `project_bs_ec2_wedge_pattern` (memory) — the 2026-05-17 wedge recovery is where this surfaced. The recovery recipe in that memory mentions the crash but doesn't fix it.
- `project_bulk_update_playbook` (memory) / BS#934 — same WHERE shape, same per-row cost concerns; the playbook covers the bulk-UPDATE side but not the ANY-param-count side.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

flowsheet-etl: post-import dj_name UPDATE crashes on large legacy_entry_id batches (400+ param ANY clause) #939

Problem

Where

Reproducer

Desired end state

Suggested approach

Acceptance

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

flowsheet-etl: post-import dj_name UPDATE crashes on large legacy_entry_id batches (400+ param ANY clause) #939

Description

Problem

Where

Reproducer

Desired end state

Suggested approach

Acceptance

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions