perf(webapp): cache task metadata in Redis for the trigger hotpath by ericallam · Pull Request #3625 · triggerdotdev/trigger.dev

ericallam · 2026-05-15T05:41:26Z

Summary

The trigger-task hotpath used to early-return without a DB query when a
caller passed both a queue override and a per-trigger TTL — the hottest
configuration on the trigger API. Adding triggerSource to the resolver
so the runs-list "Source" filter could distinguish STANDARD / SCHEDULED /
AGENT runs removed those early-returns, costing +2 DB queries per trigger
on non-locked calls and +1 on locked calls.

This change caches BackgroundWorkerTask metadata (ttl, triggerSource,
queueId, queueName) in Redis so the resolver can satisfy every caller
configuration with a single HGET on the warm path. PG fallback on miss
back-fills the cache.

Follow-up to #3542.

Design

Two key spaces:

task-meta:env:{envId} — the "current worker" view, refreshed at every
deploy promotion. 24h safety TTL.
task-meta:by-worker:{workerId} — used for lockToVersion triggers.
Immutable post-create. 30d sliding TTL so historical workers age out.

Cache writes use Lua scripts via defineCommand so DEL + HSET +
EXPIRE land atomically — concurrent readers never see the empty
intermediate state of a naive pipeline. Read-path back-fill uses
single-field upserts so concurrent back-fills don't wipe each other's
siblings.

The cache lives behind its own TASK_META_CACHE_REDIS_* env-var prefix
that falls back to the default REDIS_* set, so operators can route the
cache to a dedicated Redis instance if they want.

The service/instance file split (taskMetadataCache.server.ts for the
pure class, taskMetadataCacheInstance.server.ts for the env-wired
singleton) mirrors the existing runsReplicationService /
runsReplicationInstance pattern.

Test plan

pnpm run typecheck --filter webapp
pnpm run test ./test/engine/triggerTask.test.ts --run — 8
existing tests untouched + 5 new tests covering warm cache, cold
miss with back-fill, queue + ttl path, by-worker vs env keyspace,
and the promotion cache write
End-to-end against a dev worker: registering writes both keyspaces
with the expected TTLs, and redis-cli HGETALL "tr:task-meta:env:<envId>"
returns the cached entries

Benchmark

Measured DefaultQueueManager.resolveQueueProperties against a real Postgres + Redis (vitest containerTest, single-host docker). 500 sequential calls and 2,000 parallel calls (concurrency=50) per scenario, request shaped as { taskId, queue: "bench-queue", ttl: "5m" } — the hot path this PR restores.

sequential (one in flight at a time):
[noop cache (baseline)]  n=500   mean=1.423ms  p50=1.394ms  p95=1.735ms  p99=2.629ms  max=11.100ms
[redis cache, cold   ]  n=500   mean=1.346ms  p50=1.283ms  p95=1.688ms  p99=2.463ms  max=5.058ms
[redis cache, warm   ]  n=500   mean=0.084ms  p50=0.078ms  p95=0.105ms  p99=0.156ms  max=1.129ms
speedup (warm vs baseline, sequential): 16.95x

parallel (concurrency=50):
[noop cache (baseline)]  n=2000  mean=10.069ms  p50=8.850ms  p95=14.718ms  p99=31.887ms  total=405ms  ops/s=4,940
[redis cache, warm   ]  n=2000  mean=0.614ms   p50=0.568ms  p95=1.189ms   p99=1.432ms   total=25ms   ops/s=80,389
throughput speedup (warm vs baseline, parallel): 16.27x

Read:

Warm cache cuts resolver latency 17× at p50 — from ~1.4 ms to ~78 µs per call.
Cold cache is on par with baseline — the extra HGET miss adds <50 µs against the two Postgres queries that follow, so the worst case is not worse than today.
Under burst load (50 concurrent triggers), the baseline's p99 jumps to ~32 ms as Postgres connections queue up; warm stays at ~1.4 ms. The cache moves the saturation point from ~5k ops/s (PG pool) to ~80k ops/s (single-client Redis pipelining).

Caveats: single-host docker, local Postgres + Redis, resolver-only measurement (excludes the rest of the trigger transaction). Prod adds region-local Redis RTT (~0.3–0.8 ms) which shifts warm absolute numbers up but keeps the ratio intact.

changeset-bot · 2026-05-15T05:41:33Z

⚠️ No Changeset found

Latest commit: 710df37

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

coderabbitai · 2026-05-15T05:41:44Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

This pull request implements a Redis-backed task metadata cache to accelerate task trigger operations by avoiding per-request database lookups for task defaults. The cache stores task metadata (TTL, trigger source/task kind, queue identifiers) in two Redis hash keyspaces: one scoped by environment (for non-locked triggers) and one scoped by worker ID (for locked-to-version triggers). Environment configuration includes Redis connection settings and separate TTL values for each keyspace. The queue resolution layer (DefaultQueueManager) now reads cached metadata with Prisma fallback and automatic backfill on cache miss. Worker creation, deployment creation, and deployment promotion flows capture task metadata and synchronously populate the cache. Comprehensive integration tests verify warm cache behavior, cache-miss fallback and backfill, queue override + cached TTL interaction, by-worker keyspace routing for locked triggers, and deployment promotion cache synchronization.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 10.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title accurately describes the main change: adding Redis caching for task metadata to improve trigger hotpath performance.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description check	✅ Passed	PR description is comprehensive with clear objectives, design rationale, test coverage, and benchmark results demonstrating the performance improvement.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch perf/trigger-task-cache

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

apps/webapp/app/services/taskMetadataCache.server.ts (1)

50-64: ⚡ Quick win

Add zod validation to the decode() function for schema-drift protection.

JSON.parse(raw) as EncodedEntry silently accepts any payload. If cached data drifts (field removed, TaskTriggerSource value retired, or malformed entry), decode() will produce a TaskMetadataEntry with undefined fields, surfacing bugs far downstream in the trigger hotpath. A zod schema hardens this and aligns with the repo's validation conventions.

♻️ Proposed refactor

-import type { Redis, Result, Callback } from "ioredis";
-import type { TaskTriggerSource } from "@trigger.dev/database";
-import { logger } from "./logger.server";
+import type { Redis, Result, Callback } from "ioredis";
+import { TaskTriggerSource } from "@trigger.dev/database";
+import { z } from "zod";
+import { logger } from "./logger.server";
@@
-type EncodedEntry = {
-  t: string | null;
-  k: TaskTriggerSource;
-  q: string | null;
-  n: string;
-};
+const EncodedEntrySchema = z.object({
+  t: z.string().nullable(),
+  k: z.nativeEnum(TaskTriggerSource),
+  q: z.string().nullable(),
+  n: z.string(),
+});
+type EncodedEntry = z.infer<typeof EncodedEntrySchema>;
@@
 function decode(slug: string, raw: string): TaskMetadataEntry | null {
   try {
-    const parsed = JSON.parse(raw) as EncodedEntry;
+    const parsed = EncodedEntrySchema.parse(JSON.parse(raw));
     return {
       slug,
       ttl: parsed.t,
       triggerSource: parsed.k,
       queueId: parsed.q,
       queueName: parsed.n,
     };
   } catch (error) {
     logger.error("Failed to decode task metadata cache entry", { slug, error });
     return null;
   }
 }

Per coding guideline: "Use zod for validation in packages/core and apps/webapp".

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/webapp/app/services/taskMetadataCache.server.ts` around lines 50 - 64,
The decode function currently trusts JSON.parse and casts to EncodedEntry which
can produce undefined fields; add a zod schema (e.g., EncodedEntrySchema) that
validates the shape { t: number, k: TaskTriggerSource, q?: string | null, n?:
string | null } (use the existing TaskTriggerSource enum/union for k), import
zod and run safeParse on the parsed JSON inside decode(slug, raw); if validation
fails, logger.error with the parse errors and return null, otherwise map the
validated fields to TaskMetadataEntry (ttl <- t, triggerSource <- k, queueId <-
q, queueName <- n). Ensure you reference and validate the exact field names
t/k/q/n and use EncodedEntrySchema.safeParse instead of a raw type cast.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@apps/webapp/app/services/taskMetadataCache.server.ts`:
- Around line 50-64: The decode function currently trusts JSON.parse and casts
to EncodedEntry which can produce undefined fields; add a zod schema (e.g.,
EncodedEntrySchema) that validates the shape { t: number, k: TaskTriggerSource,
q?: string | null, n?: string | null } (use the existing TaskTriggerSource
enum/union for k), import zod and run safeParse on the parsed JSON inside
decode(slug, raw); if validation fails, logger.error with the parse errors and
return null, otherwise map the validated fields to TaskMetadataEntry (ttl <- t,
triggerSource <- k, queueId <- q, queueName <- n). Ensure you reference and
validate the exact field names t/k/q/n and use EncodedEntrySchema.safeParse
instead of a raw type cast.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 2d23dad3-c348-4cf6-8399-d667fe3f4f4b

📥 Commits

Reviewing files that changed from the base of the PR and between 3f6d8eb and 1b08acb.

📒 Files selected for processing (5)

apps/webapp/app/services/taskMetadataCache.server.ts
apps/webapp/app/v3/services/createBackgroundWorker.server.ts
apps/webapp/app/v3/services/createDeploymentBackgroundWorkerV3.server.ts
apps/webapp/app/v3/services/createDeploymentBackgroundWorkerV4.server.ts
apps/webapp/test/engine/triggerTask.test.ts

🚧 Files skipped from review as they are similar to previous changes (4)

apps/webapp/app/v3/services/createDeploymentBackgroundWorkerV4.server.ts
apps/webapp/app/v3/services/createBackgroundWorker.server.ts
apps/webapp/test/engine/triggerTask.test.ts
apps/webapp/app/v3/services/createDeploymentBackgroundWorkerV3.server.ts

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)

GitHub Check: webapp / 🧪 Unit Tests: Webapp (8, 8)
GitHub Check: webapp / 🧪 Unit Tests: Webapp (1, 8)
GitHub Check: webapp / 🧪 Unit Tests: Webapp (7, 8)
GitHub Check: webapp / 🧪 Unit Tests: Webapp (3, 8)
GitHub Check: webapp / 🧪 Unit Tests: Webapp (6, 8)
GitHub Check: webapp / 🧪 Unit Tests: Webapp (5, 8)
GitHub Check: webapp / 🧪 Unit Tests: Webapp (4, 8)
GitHub Check: webapp / 🧪 Unit Tests: Webapp (2, 8)
GitHub Check: typecheck / typecheck
GitHub Check: e2e-webapp / 🧪 E2E Tests: Webapp

🧰 Additional context used

📓 Path-based instructions (8)

**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead

**/*.{ts,tsx}: Import from @trigger.dev/core subpaths only, never from the root. Subpath imports must be used to maintain proper module boundaries.
When writing Trigger.dev tasks, always import from @trigger.dev/sdk. Never use @trigger.dev/sdk/v3 or deprecated client.defineJob.
Prisma is version 6.14.0. Use the Prisma client from internal-packages/database for all database operations.
For ClickHouse client, schema migrations, and analytics queries, use internal-packages/clickhouse.

Files:

apps/webapp/app/services/taskMetadataCache.server.ts

{packages/core,apps/webapp}/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use zod for validation in packages/core and apps/webapp

Files:

apps/webapp/app/services/taskMetadataCache.server.ts

**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use function declarations instead of default exports

Add crumbs as you write code — not just when debugging. Mark lines with // @Crumbs or wrap blocks in `// `#region` `@crumbs. They stay on the branch throughout development and are stripped by agentcrumbs strip before merge.

Files:

apps/webapp/app/services/taskMetadataCache.server.ts

**/*.ts

📄 CodeRabbit inference engine (.cursor/rules/otel-metrics.mdc)

**/*.ts: When creating or editing OTEL metrics (counters, histograms, gauges), ensure metric attributes have low cardinality by using only enums, booleans, bounded error codes, or bounded shard IDs
Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)
When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries

Files:

apps/webapp/app/services/taskMetadataCache.server.ts

apps/webapp/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)

apps/webapp/**/*.{ts,tsx}: Access environment variables through the env export of env.server.ts instead of directly accessing process.env
Use subpath exports from @trigger.dev/core package instead of importing from the root @trigger.dev/core path

Use named constants for sentinel/placeholder values (e.g. const UNSET_VALUE = '__unset__') instead of raw string literals scattered across comparisons

Files:

apps/webapp/app/services/taskMetadataCache.server.ts

apps/webapp/**/*.server.ts

📄 CodeRabbit inference engine (apps/webapp/CLAUDE.md)

apps/webapp/**/*.server.ts: Never use request.signal for detecting client disconnects. Use getRequestAbortSignal() from app/services/httpAsyncStorage.server.ts instead, which is wired directly to Express res.on('close') and fires reliably
Access environment variables via env export from app/env.server.ts. Never use process.env directly
Always use findFirst instead of findUnique in Prisma queries. findUnique has an implicit DataLoader that batches concurrent calls and has active bugs even in Prisma 6.x (uppercase UUIDs returning null, composite key SQL correctness issues, 5-10x worse performance). findFirst is never batched and avoids this entire class of issues

Files:

apps/webapp/app/services/taskMetadataCache.server.ts

**/*.{ts,tsx,js,jsx,json,md,css,scss}

📄 CodeRabbit inference engine (AGENTS.md)

Code formatting is enforced using Prettier. Run pnpm run format before committing

Files:

apps/webapp/app/services/taskMetadataCache.server.ts

apps/**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (CLAUDE.md)

When modifying only server components (apps/webapp/, apps/supervisor/, etc.) with no package changes, add a .server-changes/ file instead of a changeset. See .server-changes/README.md for format and documentation.

Files:

apps/webapp/app/services/taskMetadataCache.server.ts

🧠 Learnings (5)

📚 Learning: 2026-03-22T13:26:12.060Z

Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3244
File: apps/webapp/app/components/code/TextEditor.tsx:81-86
Timestamp: 2026-03-22T13:26:12.060Z
Learning: In the triggerdotdev/trigger.dev codebase, do not flag `navigator.clipboard.writeText(...)` calls for `missing-await`/`unhandled-promise` issues. These clipboard writes are intentionally invoked without `await` and without `catch` handlers across the project; keep that behavior consistent when reviewing TypeScript/TSX files (e.g., usages like in `apps/webapp/app/components/code/TextEditor.tsx`).

Applied to files:

apps/webapp/app/services/taskMetadataCache.server.ts

📚 Learning: 2026-03-22T19:24:14.403Z

Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 3187
File: apps/webapp/app/v3/services/alerts/deliverErrorGroupAlert.server.ts:200-204
Timestamp: 2026-03-22T19:24:14.403Z
Learning: In the triggerdotdev/trigger.dev codebase, webhook URLs are not expected to contain embedded credentials/secrets (e.g., fields like `ProjectAlertWebhookProperties` should only hold credential-free webhook endpoints). During code review, if you see logging or inclusion of raw webhook URLs in error messages, do not automatically treat it as a credential-leak/secrets-in-logs issue by default—first verify the URL does not contain embedded credentials (for example, no username/password in the URL, no obvious secret/token query params or fragments). If the URL is credential-free per this project’s conventions, allow the logging.

Applied to files:

apps/webapp/app/services/taskMetadataCache.server.ts

📚 Learning: 2026-03-26T09:02:07.973Z

Learnt from: myftija
Repo: triggerdotdev/trigger.dev PR: 3274
File: apps/webapp/app/services/runsReplicationService.server.ts:922-924
Timestamp: 2026-03-26T09:02:07.973Z
Learning: When parsing Trigger.dev task run annotations in server-side services, keep `TaskRun.annotations` strictly conforming to the `RunAnnotations` schema from `trigger.dev/core/v3`. If the code already uses `RunAnnotations.safeParse` (e.g., in a `#parseAnnotations` helper), treat that as intentional/necessary for atomic, schema-accurate annotation handling. Do not recommend relaxing the annotation payload schema or using a permissive “passthrough” parse path, since the annotations are expected to be written atomically in one operation and should not contain partial/legacy payloads that would require a looser parser.

Applied to files:

apps/webapp/app/services/taskMetadataCache.server.ts

📚 Learning: 2026-05-05T09:38:02.512Z

Learnt from: d-cs
Repo: triggerdotdev/trigger.dev PR: 3523
File: apps/webapp/app/routes/api.v3.batches.ts:178-181
Timestamp: 2026-05-05T09:38:02.512Z
Learning: When reviewing code that catches `ServiceValidationError` in `*.server.ts` files, do not blindly forward `error.status` to HTTP responses, because SVEs may be thrown with non-default statuses (e.g., 400/500) and forwarding them can cause client-visible behavioral regressions (e.g., surfacing 500s to clients). Prefer a safe default response status of `error.status ?? 422`, but only after confirming via the reachable call graph that the caught `ServiceValidationError` instances are expected to carry those non-default statuses; otherwise, normalize to `422` to avoid unexpected client-visible 5xx behavior.

Applied to files:

apps/webapp/app/services/taskMetadataCache.server.ts

📚 Learning: 2026-05-12T21:04:05.815Z

Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3542
File: apps/webapp/app/components/sessions/v1/SessionStatus.tsx:1-3
Timestamp: 2026-05-12T21:04:05.815Z
Learning: In this Remix + TypeScript codebase, do not flag a server/client boundary violation when a file imports only types from a module matching `*.server`.

Specifically, it’s safe to import types using `import type { Foo } from "*.server"` or `import { type Foo } from "*.server"` because TypeScript erases type-only imports at compile time and they emit no JavaScript, so they won’t cross the Remix server/client bundle boundary.

Only raise the boundary concern for value imports (e.g., `import { Foo }` without `type`, or `import Foo`), since those produce JavaScript output.

Applied to files:

apps/webapp/app/services/taskMetadataCache.server.ts

…builds (#3626) ## Summary `LocalsKey<T>` (the type returned by `locals.create()`) was branded with a module-level `declare const __local: unique symbol`. Each such declaration is its own nominal type, and `tshy` emits separate `.d.ts` files for the ESM and CJS outputs — each gets its own `__local` symbol. Under certain pnpm hoisting layouts a single TypeScript compilation can resolve `LocalsKey` from both the ESM source path and the CJS dist path within the same call site, producing two structurally-incompatible variants of the same type. TS surfaces this as the misleading error: ``` Argument of type 'LocalsKey<X>' is not assignable to parameter of type 'LocalsKey<X>'. Property '[__local]' is missing in type 'LocalsKey<X>' but required in type 'BrandLocal<X>'. ``` The error has been hitting CI on PRs opened since the chat.agent stack landed (e.g. #3625 typecheck job), but doesn't reproduce on developer machines where the pnpm node_modules layout was built up incrementally. ## Fix Replace the `unique symbol` brand with an optional phantom field that carries `T` at the type level: ```ts // before declare const __local: unique symbol; type BrandLocal<T> = { [__local]: T }; export type LocalsKey<T> = BrandLocal<T> & { readonly id: string; readonly __type: unique symbol; }; // after export type LocalsKey<T> = { readonly id: string; readonly __type: symbol; /** Phantom carrier for the value type — never read at runtime. */ readonly __valueType?: T; }; ``` The ESM and CJS `.d.ts` outputs now produce structurally identical types, so cross-output resolution no longer produces a mismatch. `T` is still carried at the type level via the optional phantom field. The runtime shape is unchanged — `manager.ts` was already casting via `as unknown`, which is no longer needed. ## Test plan - [ ] `pnpm run typecheck --filter @trigger.dev/core --filter @trigger.dev/sdk` - [ ] `pnpm run build --filter @trigger.dev/core --filter @trigger.dev/sdk` (clean rebuild) — confirms the ESM and CJS dist `.d.ts` outputs no longer carry distinct `unique symbol` declarations - [ ] `pnpm --filter @trigger.dev/core test test/mockTaskContext.test.ts --run` - [ ] `pnpm --filter @trigger.dev/sdk test test/mockChatAgent.test.ts --run`

The trigger-task hotpath used to early-return without a DB query when a caller passed both a queue override and a per-trigger TTL — the hottest configuration on the trigger API. Adding triggerSource to the resolver so the runs-list Source filter could distinguish STANDARD / SCHEDULED / AGENT runs removed those early-returns, costing +2 DB queries per trigger on non-locked calls and +1 on locked calls. The resolver now reads BackgroundWorkerTask metadata (ttl, triggerSource, queueId, queueName) from a Redis HASH keyed by env or by worker, with PG fallback on miss that back-fills the cache. Two key spaces: - task-meta:env:{envId} refreshed at every deploy promotion; 24h safety TTL. - task-meta:by-worker:{workerId} used for lockToVersion triggers; 30d sliding TTL. Cache writes use Lua scripts via defineCommand so DEL + HSET + EXPIRE land atomically (concurrent readers never see the empty intermediate state of a naive pipeline). Read-path back-fill uses single-field upserts so concurrent back-fills don't wipe each other's siblings. Configurable via a new TASK_META_CACHE_REDIS_* env-var prefix that falls back to the default REDIS_* set, so operators can route the cache to a dedicated Redis instance if they want.

- TaskMetadataCache: interface → type alias, matching the repo's 'prefer type unless extending' standard. - syncTaskMetadataCache calls in createBackgroundWorker, createDeploymentBackgroundWorkerV3, and createDeploymentBackgroundWorkerV4 are now wrapped in tryCatch and log on failure. Matches the pattern in changeCurrentDeployment so a Redis blip can't strand a successful worker build or deploy promotion. - Back-fill assertion in 'cache miss falls through to PG' now polls with a bounded timeout instead of a fixed sleep, removing the CI flake surface. - Promotion assertion in 'ChangeCurrentDeploymentService promotes the env cache' now pre-clears the by-worker + env keys and asserts they're null before the call, so the test proves the service did the write.

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 6 additional findings.

The cache now exposes populateByCurrentWorker(envId, workerId, entries) and setByCurrentWorker(envId, workerId, entry) which write both task-meta:env and task-meta:by-worker keyspaces in a single Lua transaction. Drops the syncTaskMetadataCache helper and its isCurrent boolean — each caller picks the right method based on context (DEV worker create + deploy promotion go through populateByCurrentWorker; V4 deploy build goes through populateByWorker). Two new Lua scripts via defineCommand: - taskMetaReplaceTwoHashes: DEL + HSET + EXPIRE both keys atomically - taskMetaSetTwoFields: HSET both keys + env-preserve-TTL + worker-refresh-TTL atomically The read-path back-fill for non-locked triggers now also writes both keyspaces atomically via setByCurrentWorker, so concurrent back-fills can't leave one keyspace populated and the other empty. Cache methods log+swallow Redis errors internally, so the outer tryCatch wrappers at the four call sites are gone — a Redis blip can't break any post-cache side effect.

…worker, clear stale cache on zero-task deploys Two CodeRabbit findings: 1. The locked + queue-override branch silently accepted a missing task slug on the locked worker version, leaving `taskKind` / `ttl` as undefined instead of erroring. Now matches the no-override branch and throws a ServiceValidationError if the task isn't on the locked worker. 2. The cache populate methods short-circuited on empty entries, leaving stale hashes in Redis when a worker registered or got promoted with zero tasks. The Lua scripts already handle the empty-entries case (DEL + no HSET), so just drop the gates at the cache class and at the four call sites so promotion correctly clears prior data.

Two CI failures on the prior push: 1. Typecheck TS2556 in taskMetadataCache.server.ts: the populate methods were spreading a mixed string[] into the Lua `defineCommand` rest parameter. Pass the fixed positional args (key, ttl) directly and spread only the variable field/value pairs. 2. Webapp shard 6: the cache test file pulled in ChangeCurrentDeploymentService, whose transitive import chain reaches triggerTaskV1.server.ts → autoIncrementCounter.server.ts which throws at module load when REDIS_HOST/REDIS_PORT aren't in the shard's env. Drop test 5 (deploy-promotion side effect) — it doesn't belong in triggerTask.test.ts anyway.

…writes Stamp the env hash with an `__owner_worker_id` field at promotion time; the back-fill Lua script reads it and CAS-skips the env write if it doesn't match the worker the back-filler resolved to. Scenario the guard closes: 1. Resolver reads env cache, misses 2. Resolver reads PG: findCurrentWorker -> A, BWT for A -> A's data 3. [promotion to B fires, atomically replaces env hash + by-worker:B] 4. Resolver fires back-fill with workerId=A -> env hash now stamped owner=B; A != B -> back-fill skips env write -> by-worker:A still written (harmless; key contains workerId, dead worker) The by-worker keyspace doesn't need a CAS check — the key itself contains the workerId, so stale writes land in a dead worker's keyspace and are never read. Devin Review flagged this as a sub-millisecond race bounded by the 24h env TTL. Cost to close it is one extra HGET inside the Lua call (atomic with the HSET, no extra round trip) and one extra HSET on the promotion write.

Match main's pre-PR behavior — when a caller passes `lockToVersion` and a queue override and the task slug isn't registered on that worker version, fall through with `taskKind = undefined` (coalesced to "STANDARD" downstream) and `taskTtl = undefined` instead of throwing. This reverts the throw added earlier in this PR. The trade-off review landed on "strict consistency check is better" twice (CodeRabbit recommended it, Devin pushed back, we kept the throw) — but the net is a behavioral change in the trigger API, and customers running `lockToVersion` + a queue override against a slug that doesn't exist on that version would start getting 4xx errors where they got silent defaults before. Default-to-silent matches main and avoids the surprise. The no-override branch keeps its throw because there's no queue to route to in that case — the failure mode there is unrecoverable.

This comment was marked as resolved.

Sign in to view

coderabbitai Bot reviewed May 15, 2026

View reviewed changes

ericallam mentioned this pull request May 15, 2026

fix(core): drop unique-symbol brand on LocalsKey to fix dual-package builds #3626

Merged

4 tasks

ericallam added 2 commits May 15, 2026 08:30

ericallam force-pushed the perf/trigger-task-cache branch from 1b08acb to 1a06ef6 Compare May 15, 2026 07:30

devin-ai-integration Bot reviewed May 15, 2026

View reviewed changes

This comment was marked as resolved.

Sign in to view

ericallam marked this pull request as ready for review May 15, 2026 08:23

This comment was marked as resolved.

Sign in to view

d-cs approved these changes May 15, 2026

View reviewed changes

ericallam added 2 commits May 15, 2026 10:47

ericallam merged commit 454f0c9 into main May 15, 2026
26 checks passed

ericallam deleted the perf/trigger-task-cache branch May 15, 2026 10:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(webapp): cache task metadata in Redis for the trigger hotpath#3625

perf(webapp): cache task metadata in Redis for the trigger hotpath#3625
ericallam merged 7 commits into
mainfrom
perf/trigger-task-cache

ericallam commented May 15, 2026 •

edited

Loading

Uh oh!

changeset-bot Bot commented May 15, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 15, 2026 •

edited

Loading

Reviews paused

❌ Failed checks (1 warning)

Uh oh!

This comment was marked as resolved.

Uh oh!

coderabbitai Bot left a comment

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

ericallam commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Design

Test plan

Benchmark

Uh oh!

changeset-bot Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

coderabbitai Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

This comment was marked as resolved.

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ericallam commented May 15, 2026 •

edited

Loading

changeset-bot Bot commented May 15, 2026 •

edited

Loading

coderabbitai Bot commented May 15, 2026 •

edited

Loading