Problem
apps/backend/services/search.service.ts runs a SELECT COUNT(*)::int AS total FROM flowsheet WHERE entry_type = 'track' in parallel with the data query on every /flowsheet/search call. Over the current 2.6 M flowsheet rows that count is a 7-second parallel sequential scan when the visibility map is cold:
Limit (cost=0.43..135.36) (actual 0.034..0.348 rows=50)
-> Index Scan using flowsheet_track_add_time_idx ... ← data side: 0.4 ms
Finalize Aggregate (cost=146738.72..146738.74) (actual 6989..7055)
-> Parallel Seq Scan on flowsheet
Filter: entry_type = 'track'
Buffers: shared hit=14882 read=115231 + I/O Timings: read=19618 ms
The 5 s statement timeout always fires, the endpoint 500s. This caused a live incident on 2026-05-17 immediately after the post-wedge flowsheet-etl backfill landed 401 fresh rows — the bulk insert invalidated the visibility map, autovacuum hadn't caught up, and dj-site's empty-q search calls started timing out.
VACUUM (ANALYZE) wxyc_schema.flowsheet (~80 s) fixes it by refreshing the visibility map so the planner picks an index-only scan over flowsheet_track_add_time_idx. That's the operational fix, not a durable one — every bulk operation against flowsheet (manual scripts, post-wedge ETL backfill, mojibake recovery, etc.) re-opens the same window until autovacuum catches up.
Desired end state
The empty-q count path is constant-time, independent of visibility-map state. The same pattern already lives in apps/backend/services/flowsheet.service.ts:225 (getEntryCount for the V1 flowsheet list endpoint) — BS#911 / H4 closed that ticket as "documented, no action" with a re-evaluation trigger comment, because the path there is pg_class.reltuples. The search endpoint should adopt the same approach for the empty-q case.
Where
apps/backend/services/search.service.ts — searchFlowsheet() builds countQuery from fullWhere. When q is empty (conditions.length === 0) and there's no cursor, fullWhere is exactly FROM flowsheet WHERE entry_type = 'track' — the same shape as getEntryCount.
apps/backend/services/flowsheet.service.ts:225 — getEntryCount for the reltuples reference implementation, including the reltuples = -1 "never analyzed" sentinel handling.
apps/backend/controllers/search.controller.ts:77 — composes the response body; total field is what the refactor needs to preserve.
Suggested approach
- In
searchFlowsheet, when the query has no user-supplied conditions (empty q AND no cursor), short-circuit countQuery to a reltuples lookup instead of SELECT COUNT(*). Reuse getEntryCount if the schema/timing fits; otherwise inline the same query.
- When the query DOES have user conditions, keep
COUNT(*) — the planner is fine when WHERE narrows to a small subset (trigram-index-backed artist_name ILIKE ...).
- Add a test that pins: empty
q → reltuples path, non-empty q → COUNT path.
Acceptance
Related
- BS#911 / H4 —
getEntryCount reltuples documentation (the same incident shape, different endpoint).
- BS#934 — bulk-UPDATE ANALYZE rule (the upstream pattern that exposed this;
flowsheet-etl post-import UPDATE invalidates stats the same way).
project_bs_ec2_wedge_pattern (memory) — the 2026-05-17 incident in which this surfaced live.
Problem
apps/backend/services/search.service.tsruns aSELECT COUNT(*)::int AS total FROM flowsheet WHERE entry_type = 'track'in parallel with the data query on every/flowsheet/searchcall. Over the current 2.6 M flowsheet rows that count is a 7-second parallel sequential scan when the visibility map is cold:The 5 s statement timeout always fires, the endpoint 500s. This caused a live incident on 2026-05-17 immediately after the post-wedge
flowsheet-etlbackfill landed 401 fresh rows — the bulk insert invalidated the visibility map, autovacuum hadn't caught up, and dj-site's empty-qsearch calls started timing out.VACUUM (ANALYZE) wxyc_schema.flowsheet(~80 s) fixes it by refreshing the visibility map so the planner picks an index-only scan overflowsheet_track_add_time_idx. That's the operational fix, not a durable one — every bulk operation againstflowsheet(manual scripts, post-wedge ETL backfill, mojibake recovery, etc.) re-opens the same window until autovacuum catches up.Desired end state
The empty-
qcount path is constant-time, independent of visibility-map state. The same pattern already lives inapps/backend/services/flowsheet.service.ts:225(getEntryCountfor the V1 flowsheet list endpoint) — BS#911 / H4 closed that ticket as "documented, no action" with a re-evaluation trigger comment, because the path there ispg_class.reltuples. The search endpoint should adopt the same approach for the empty-qcase.Where
apps/backend/services/search.service.ts—searchFlowsheet()buildscountQueryfromfullWhere. Whenqis empty (conditions.length === 0) and there's no cursor,fullWhereis exactlyFROM flowsheet WHERE entry_type = 'track'— the same shape asgetEntryCount.apps/backend/services/flowsheet.service.ts:225—getEntryCountfor the reltuples reference implementation, including thereltuples = -1"never analyzed" sentinel handling.apps/backend/controllers/search.controller.ts:77— composes the response body;totalfield is what the refactor needs to preserve.Suggested approach
searchFlowsheet, when the query has no user-supplied conditions (emptyqAND no cursor), short-circuitcountQueryto areltupleslookup instead ofSELECT COUNT(*). ReusegetEntryCountif the schema/timing fits; otherwise inline the same query.COUNT(*)— the planner is fine when WHERE narrows to a small subset (trigram-index-backedartist_name ILIKE ...).q→ reltuples path, non-emptyq→ COUNT path.Acceptance
q/flowsheet/searchrequests return atotalwithin ±1 % of the true count, with< 100 msresponse time, regardless of recent bulk inserts.q/flowsheet/searchrequests still return an exact count.search.service.tscross-referencesflowsheet.service.ts:getEntryCountso future readers see the shared pattern.Related
getEntryCountreltuples documentation (the same incident shape, different endpoint).flowsheet-etlpost-import UPDATE invalidates stats the same way).project_bs_ec2_wedge_pattern(memory) — the 2026-05-17 incident in which this surfaced live.