Reconsider process-wide limiter shape (G4 follow-up)

## Context

G4 (#906) shipped a module-level `Semaphore(5)` + `TokenBucket(50/min)` in `apps/backend/services/lml/lml.client.ts` to mirror LML's `discogs_max_concurrent` + `discogs_rate_limit` ceilings, giving BS pre-emptive backpressure so we don't queue arbitrary depth inside LML. The implementation works as designed and is the right safety net to have.

This ticket is a design refinement, not a bug. The BS#955 incident surfaced four sharp edges in the chosen shape; this ticket frames the design questions for a future revisit. The fixes we shipped (#957 env-var overrides on both CI surfaces, follow-ups #958 + #959) are point patches that keep the current shape working; they don't address the shape itself.

## Sharp edges observed

### 1. Module-level state persists across spec files under `--runInBand`

`TokenBucket` and `Semaphore` are module-scoped singletons initialized at import. The integration suite (`--runInBand`) imports `lml.client.ts` once and shares the bucket across every spec. By the time `metadata-lml.spec.js` runs, earlier specs (`flowsheet.spec.js`, `search.spec.js`, etc.) may have fired enough `fireAndForgetMetadata` calls to drain the 50-token bucket; subsequent calls then wait ~1200 ms each for refill, blowing the 30 s per-test budget. This was the BS#955 cause; today the workaround is to set `LML_CLIENT_RATE_PER_MIN=60000` in CI to make the bucket effectively unlimited.

A bucket scoped to a request (or to a logical caller, e.g. the proxy controller vs the enrichment job) would not need test overrides.

### 2. Test-vs-prod env divergence has two surfaces

The override has to be wired in **both** local `ci:testmock` (docker-compose `backend` service env block) and GHA Integration-Tests (workflow `Start services` env block — host node processes). Pinned by source-grep test `tests/unit/scripts/lml-limiter-test-env.test.ts`. Any future limiter knob will need the same dual wire-up. BS#958 will mechanically guard the env-block divergence, but doesn't address the underlying coupling between limiter knob count and CI-surface count.

### 3. No integration with the existing per-request budget

PR #618 added `LIBRARY_SEARCH_ENRICHMENT_BUDGET_MS` (per-request deadline budget); LML's API contract supports an `X-Caller-Budget-Ms` header. A request that's been queueing inside the bucket for 1200 ms has burned ~25 % of a typical 5 s budget before its actual upstream call starts, but the caller can't see that — the limiter and the budget timer are independent axes. Either the limiter should consult the deadline (skip rather than queue when budget would be exceeded) or queue time should be subtracted from the budget passed to LML.

### 4. Shared bucket across call sites with different deadline tolerance

A synchronous `/proxy/metadata/album` lookup has a tight deadline (the iOS client is waiting). A `fireAndForgetMetadata` enrichment call has loose deadline tolerance (the row will get filled in eventually, the client polls). Sharing one bucket means a flood of fire-and-forget can starve sync. Splitting by call site (or giving sync priority through a small reserved sub-budget) would let us tune them independently.

## Design questions (not committing to a fix)

- Should the bucket be scoped to a request, a call-site, or the process?
- Should bucket queueing consume from the caller's budget?
- Should sync and fire-and-forget share a bucket or split?
- Is the right metric here \"depth\" (current), \"wait time per call\", or both?
- If we keep process-wide, does the existing `_resetLmlClientLimitersForTest()` hook want to wire into `beforeEach` for integration setup, so spec files don't inherit each other's bucket state?

## Out of scope

- Reverting G4. The safety net is real — when LML actually saturates Discogs, the limiter does its job.
- Replacing the limiter with a circuit breaker. That's a separate concern (failure response, not pre-emptive backpressure).

## Related

- Origin: #906 (G4 — closed via PR #948 on 2026-05-18)
- Incident: #955 (closed via PR #957) — the suite-wide drain that exposed edge 1
- Point patches still landing: #958 (env-surface guard), #959 (DB_PORT conflation)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reconsider process-wide limiter shape (G4 follow-up) #962

Context

Sharp edges observed

1. Module-level state persists across spec files under `--runInBand`

2. Test-vs-prod env divergence has two surfaces

3. No integration with the existing per-request budget

4. Shared bucket across call sites with different deadline tolerance

Design questions (not committing to a fix)

Out of scope

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Reconsider process-wide limiter shape (G4 follow-up) #962

Description

Context

Sharp edges observed

1. Module-level state persists across spec files under --runInBand

2. Test-vs-prod env divergence has two surfaces

3. No integration with the existing per-request budget

4. Shared bucket across call sites with different deadline tolerance

Design questions (not committing to a fix)

Out of scope

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

1. Module-level state persists across spec files under `--runInBand`