Skip to content

fix(appkit): decouple cache in-flight execution from per-caller abort…#398

Open
MarioCadenas wants to merge 5 commits into
mainfrom
mario/6220b406
Open

fix(appkit): decouple cache in-flight execution from per-caller abort…#398
MarioCadenas wants to merge 5 commits into
mainfrom
mario/6220b406

Conversation

@MarioCadenas
Copy link
Copy Markdown
Collaborator

@MarioCadenas MarioCadenas commented May 21, 2026

Summary

Under React 19 <StrictMode>, navigating to /analytics in dev surfaced this error in the browser console:

[useAnalyticsQuery] Code: UPSTREAM_ERROR, Message: Statement failed: The operation was aborted.
Removing <StrictMode> made it disappear, which made it look like a UI quirk. It wasn't — it was a real cache bug that StrictMode happened to expose.

Root cause

CacheManager.getOrExecute dedupes concurrent callers with the same cacheKey by sharing a single Promise in inFlightRequests. Under StrictMode:

  1. Mount 1 of each useAnalyticsQuery fires a request → server registers an in-flight promise for the query's cacheKey.
  2. StrictMode cleanup aborts mount 1's fetch → server's res.on("close") aborts the stream's AbortController with "Client disconnected" → the SDK's executor.query(..., signal) rejects with Statement failed: The operation was aborted. (an ExecutionError carrying statusCode).
  3. Mount 2 (still alive) hits the in-flight branch and awaits the same rejected promise.
  4. stream-manager._processGeneratorInBackground catches that rejection on mount 2's stream entry (which still has a connected client), categorizes it as UPSTREAM_ERROR (because statusCode is set and error.name !== "AbortError"), and broadcasts it to the browser.
    Smoking-gun log from the catch on mount 2's stream:
    {"clientsRemaining": 1, "errorCode": "UPSTREAM_ERROR", "abortReason": }
    clientsRemaining: 1 + no abortReason proves the abort came in via the shared in-flight promise, not via this stream's own controller.
    This is impossible without StrictMode (only one mount → no inflight join) and impossible on the reconnect view (different code path, no CacheInterceptor).

Fix

Reference-count the cache's in-flight execution behind a cache-owned AbortController:

  • inFlightRequests now holds { promise, refCount, sharedController } instead of bare Promises.
  • getOrExecute(key, fn, userKey, options) accepts options.callerSignal and passes a cache-owned sharedController.signal into fn(sharedSignal). The shared controller is aborted only when refCount drops to 0 — i.e. every caller has bailed.
  • New private _waitWithRefCount(entry, callerSignal) races the entry's promise against each caller's own signal. A caller-side abort rejects that caller's promise locally and decrements refCount; other still-connected awaiters continue to share the underlying execution.
  • The catch block no longer wraps abort-shaped errors as ExecutionError.statementFailed when the shared controller has already aborted (no live awaiter would observe them anyway), and entry.promise.catch(() => {}) suppresses unhandled-rejection warnings when every caller leaves before fn() resolves.
  • CacheInterceptor forwards context.signal as callerSignal and swaps context.signal to the cache's sharedSignal for the duration of fn() so the inner interceptor chain (timeout/retry/telemetry) and the underlying I/O (e.g. analytics executor.query(..., signal)) observe the shared lifecycle instead of the per-request stream signal.
    External callers (e.g. cache.getOrExecute(key, async () => {...}, userKey)) keep working unchanged because the new sharedSignal parameter is optional.

Test plan

  • pnpm exec vitest run --project appkit src/cache src/plugin src/stream src/plugins/analytics — 715 tests, all green.
  • Manual repro in apps/dev-playground: load /analytics under <StrictMode> — no UPSTREAM_ERROR in console; tiles load with data.
  • Verified cache dedup still works post-fix (the in-flight join branch still fires per logs; only the shared-fate-on-abort behavior changed).
  • Reviewer sanity-check: confirm there's no behavior regression for non-deduped callers (e.g. telemetry-example-plugin.ts's direct cache.getOrExecute use without a signal).

@MarioCadenas MarioCadenas requested a review from pkosiec May 21, 2026 14:34
@MarioCadenas MarioCadenas marked this pull request as ready for review May 21, 2026 14:35
@MarioCadenas MarioCadenas requested a review from a team as a code owner May 21, 2026 14:35
… signal

The cache in-flight deduplication map shared a single Promise across
concurrent callers with the same cacheKey. When one caller's signal
aborted (e.g. a React StrictMode mount/cleanup pair), fn()'s rejection
cascaded through the shared promise to every other awaiter — including
still-connected SSE consumers, which broadcast the abort as
UPSTREAM_ERROR to the browser.

Replace the bare promise map with a reference-counted in-flight entry
owning its own AbortController. Callers pass their own callerSignal;
the shared controller aborts only when every caller has bailed
(refCount -> 0). Each caller's await is raced against its own signal so
local aborts reject locally without poisoning the shared result. The
catch block no longer wraps abort errors as ExecutionError.statementFailed
when the shared controller has already aborted, since no live awaiter
would observe them anyway.

CacheInterceptor forwards context.signal as callerSignal and swaps
context.signal to the cache's shared signal for the duration of fn() so
the inner interceptor chain (timeout/retry/telemetry) and the underlying
I/O (e.g. analytics SDK calls) observe the shared lifecycle rather than
the per-request stream signal.

Existing cache + plugin + stream + analytics test suites pass unchanged
(715 tests).

Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
…atures

- cache-manager.test.ts — add 5 abort/ref-counting tests covering:
  caller abort while another waits, all callers abort triggers shared
  abort, pre-aborted signal rejects immediately, single caller abort
  mid-flight, deduped caller abort doesn't poison first caller
- cache-manager.test.ts — regression test for fn ignoring signal param
- cache.test.ts — MockCacheManager passes shared signal to fn(),
  add signal-swap test verifying context.signal is swapped during
  execution and restored afterward
- 16 plugin test files — update stale getOrExecute mock signatures
  from fn: () => Promise<T> to fn: (signal?: AbortSignal) => Promise<T>

Signed-off-by: Pawel Kosiec <pawel.kosiec@databricks.com>
Copy link
Copy Markdown
Member

@pkosiec pkosiec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested this PR and it works! 👍

Comment thread packages/appkit/src/cache/index.ts
pkosiec added 3 commits May 22, 2026 17:03
- Skip joining in-flight entries whose sharedController is already aborted
- Guard finally-block delete to only remove its own entry, not a replacement
- Remove early span.end() in dedup path (let outer finally handle it)
- Add edge-case tests: fn rejection, post-resolve abort, fresh-after-abort

Signed-off-by: Pawel Kosiec <pawel.kosiec@databricks.com>
When all callers leave an in-flight cache entry, delay the
sharedController.abort() by 100ms instead of firing immediately.
This allows React StrictMode remounts to join the existing execution
before it is cancelled, preventing unnecessary SQL query aborts and
error logs in development.

- Add abortTimer to InFlightEntry with 100ms grace period
- Clear abort timer when a new caller joins the dedup path
- Skip joining aborted entries (guard against expired grace period)
- Guard finally-block delete to only remove own entry
- Remove early span.end() in dedup path
- Add grace period tests + edge-case abort tests

Signed-off-by: Pawel Kosiec <pawel.kosiec@databricks.com>
StreamManager.abort() was passing string reasons ("All clients
disconnected", "Server shutdown") which caused _categorizeError to
misclassify client disconnects as INTERNAL_ERROR instead of
STREAM_ABORTED. Use DOMException with name "AbortError" so the
error is correctly categorized and logged at info level.

Signed-off-by: Pawel Kosiec <pawel.kosiec@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants