Context
G4 (#906) shipped a module-level Semaphore(5) + TokenBucket(50/min) in apps/backend/services/lml/lml.client.ts to mirror LML's discogs_max_concurrent + discogs_rate_limit ceilings, giving BS pre-emptive backpressure so we don't queue arbitrary depth inside LML. The implementation works as designed and is the right safety net to have.
This ticket is a design refinement, not a bug. The BS#955 incident surfaced four sharp edges in the chosen shape; this ticket frames the design questions for a future revisit. The fixes we shipped (#957 env-var overrides on both CI surfaces, follow-ups #958 + #959) are point patches that keep the current shape working; they don't address the shape itself.
Sharp edges observed
1. Module-level state persists across spec files under --runInBand
TokenBucket and Semaphore are module-scoped singletons initialized at import. The integration suite (--runInBand) imports lml.client.ts once and shares the bucket across every spec. By the time metadata-lml.spec.js runs, earlier specs (flowsheet.spec.js, search.spec.js, etc.) may have fired enough fireAndForgetMetadata calls to drain the 50-token bucket; subsequent calls then wait ~1200 ms each for refill, blowing the 30 s per-test budget. This was the BS#955 cause; today the workaround is to set LML_CLIENT_RATE_PER_MIN=60000 in CI to make the bucket effectively unlimited.
A bucket scoped to a request (or to a logical caller, e.g. the proxy controller vs the enrichment job) would not need test overrides.
2. Test-vs-prod env divergence has two surfaces
The override has to be wired in both local ci:testmock (docker-compose backend service env block) and GHA Integration-Tests (workflow Start services env block — host node processes). Pinned by source-grep test tests/unit/scripts/lml-limiter-test-env.test.ts. Any future limiter knob will need the same dual wire-up. BS#958 will mechanically guard the env-block divergence, but doesn't address the underlying coupling between limiter knob count and CI-surface count.
3. No integration with the existing per-request budget
PR #618 added LIBRARY_SEARCH_ENRICHMENT_BUDGET_MS (per-request deadline budget); LML's API contract supports an X-Caller-Budget-Ms header. A request that's been queueing inside the bucket for 1200 ms has burned ~25 % of a typical 5 s budget before its actual upstream call starts, but the caller can't see that — the limiter and the budget timer are independent axes. Either the limiter should consult the deadline (skip rather than queue when budget would be exceeded) or queue time should be subtracted from the budget passed to LML.
4. Shared bucket across call sites with different deadline tolerance
A synchronous /proxy/metadata/album lookup has a tight deadline (the iOS client is waiting). A fireAndForgetMetadata enrichment call has loose deadline tolerance (the row will get filled in eventually, the client polls). Sharing one bucket means a flood of fire-and-forget can starve sync. Splitting by call site (or giving sync priority through a small reserved sub-budget) would let us tune them independently.
Design questions (not committing to a fix)
- Should the bucket be scoped to a request, a call-site, or the process?
- Should bucket queueing consume from the caller's budget?
- Should sync and fire-and-forget share a bucket or split?
- Is the right metric here "depth" (current), "wait time per call", or both?
- If we keep process-wide, does the existing
_resetLmlClientLimitersForTest() hook want to wire into beforeEach for integration setup, so spec files don't inherit each other's bucket state?
Out of scope
- Reverting G4. The safety net is real — when LML actually saturates Discogs, the limiter does its job.
- Replacing the limiter with a circuit breaker. That's a separate concern (failure response, not pre-emptive backpressure).
Related
Context
G4 (#906) shipped a module-level
Semaphore(5)+TokenBucket(50/min)inapps/backend/services/lml/lml.client.tsto mirror LML'sdiscogs_max_concurrent+discogs_rate_limitceilings, giving BS pre-emptive backpressure so we don't queue arbitrary depth inside LML. The implementation works as designed and is the right safety net to have.This ticket is a design refinement, not a bug. The BS#955 incident surfaced four sharp edges in the chosen shape; this ticket frames the design questions for a future revisit. The fixes we shipped (#957 env-var overrides on both CI surfaces, follow-ups #958 + #959) are point patches that keep the current shape working; they don't address the shape itself.
Sharp edges observed
1. Module-level state persists across spec files under
--runInBandTokenBucketandSemaphoreare module-scoped singletons initialized at import. The integration suite (--runInBand) importslml.client.tsonce and shares the bucket across every spec. By the timemetadata-lml.spec.jsruns, earlier specs (flowsheet.spec.js,search.spec.js, etc.) may have fired enoughfireAndForgetMetadatacalls to drain the 50-token bucket; subsequent calls then wait ~1200 ms each for refill, blowing the 30 s per-test budget. This was the BS#955 cause; today the workaround is to setLML_CLIENT_RATE_PER_MIN=60000in CI to make the bucket effectively unlimited.A bucket scoped to a request (or to a logical caller, e.g. the proxy controller vs the enrichment job) would not need test overrides.
2. Test-vs-prod env divergence has two surfaces
The override has to be wired in both local
ci:testmock(docker-composebackendservice env block) and GHA Integration-Tests (workflowStart servicesenv block — host node processes). Pinned by source-grep testtests/unit/scripts/lml-limiter-test-env.test.ts. Any future limiter knob will need the same dual wire-up. BS#958 will mechanically guard the env-block divergence, but doesn't address the underlying coupling between limiter knob count and CI-surface count.3. No integration with the existing per-request budget
PR #618 added
LIBRARY_SEARCH_ENRICHMENT_BUDGET_MS(per-request deadline budget); LML's API contract supports anX-Caller-Budget-Msheader. A request that's been queueing inside the bucket for 1200 ms has burned ~25 % of a typical 5 s budget before its actual upstream call starts, but the caller can't see that — the limiter and the budget timer are independent axes. Either the limiter should consult the deadline (skip rather than queue when budget would be exceeded) or queue time should be subtracted from the budget passed to LML.4. Shared bucket across call sites with different deadline tolerance
A synchronous
/proxy/metadata/albumlookup has a tight deadline (the iOS client is waiting). AfireAndForgetMetadataenrichment call has loose deadline tolerance (the row will get filled in eventually, the client polls). Sharing one bucket means a flood of fire-and-forget can starve sync. Splitting by call site (or giving sync priority through a small reserved sub-budget) would let us tune them independently.Design questions (not committing to a fix)
_resetLmlClientLimitersForTest()hook want to wire intobeforeEachfor integration setup, so spec files don't inherit each other's bucket state?Out of scope
Related