Fix error handling gaps in recent releases#3640
Conversation
Lets us ship a patch (e.g. 4.4.6) from a release/4.4.x branch without
including unreleased work merged into main, and without the patch
clobbering floating tags incorrectly.
The release-pipeline pieces this touches and how each behaves now:
npm dist-tag latest if version > current latest, else release-<M.m>
Docker :v4-beta same gate (highest version only)
Docker :release-X.Y new per-line floating tag, always set on a semver build
GitHub release --latest=true|false set explicitly (no auto-detect)
How the gate is computed:
release.yml's 'Compare new version to current latest' step queries
npm view @trigger.dev/sdk dist-tags.latest, compares via sort -V,
sets is_latest=true|false. Drives every floating tag.
Triggers / refs:
- pull_request:branches[main, release/**]
- if-conditions allow head.ref starting with 'changeset-release/'
- workflow_dispatch ref must be reachable from main OR a release/* branch
- changesets-pr.yml fires on push to release/** too; PR-enhance step
discovers source branch dynamically (no more hardcoded changeset-release/main)
Other changes:
- gh release create: drop --target main (tag carries right commit)
- dispatch-changelog payload includes is_latest so the marketing site
can render lagged-line releases differently
- enhance-release-pr.mjs prepends a Release prep header on release/*
branches showing version, current latest, and whether the PR will
take the latest dist-tag
release-helm.yml unchanged — already creates as draft+prerelease so it
can't claim Latest. publish-worker.yml (coordinator/provider) unchanged
since those don't have a :v4-beta-equivalent floating tag.
Validated end-to-end in ericallam/pkgring-sandbox across both scenarios:
Scenario A (lagged hotfix): latest stays put, only release-X.Y moves
Scenario B (main has unreleased work, hotfix is highest): latest moves
- enhance-release-pr.mjs: fix dead try/catch, use proper semver comparison - release.yml: add retry loop for npm view to prevent silent failures incorrectly promoting lagged hotfix to :latest
|
|
Hi @deepshekhardas, thanks for your interest in contributing! This project requires that pull request authors are vouched, and you are not in the list of vouched users. This PR will be closed automatically. See https://github.com/triggerdotdev/trigger.dev/blob/main/CONTRIBUTING.md for more details. |
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (10)
WalkthroughThis PR refactors the release and publishing workflow to support multiple release branches and conditionally publish Docker images. The release workflow now dynamically processes Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes ✨ Finishing Touches🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| } | ||
| } | ||
|
|
||
| const sortedAiKeys = dropPriority.filter((k) => aiKeys.has(k)); |
There was a problem hiding this comment.
🟡 applyTotalSizeBackstop fails to drop prefix-matched sub-key attributes
The backstop function collects attribute keys that match AI prefixes into aiKeys (line 87-94) using matchPrefix, which handles both exact matches and key.startsWith(prefix + "."). However, sortedAiKeys on line 96 filters dropPriority entries by checking aiKeys.has(k) — this only finds entries from dropPriority that are exact attribute keys in the map.
For example, if attributes has a key "ai.prompt.messages.0.content", it is matched by prefix "ai.prompt.messages" and added to aiKeys. But dropPriority.filter((k) => aiKeys.has(k)) checks whether "ai.prompt.messages" is in aiKeys — it isn't (only "ai.prompt.messages.0.content" is). So the sub-key is identified as AI content but never deleted, defeating the backstop.
Concrete trace showing the bug
Given attributes { "ai.prompt.messages.0.content": "huge...", "other": "x" }:
aiKeys={"ai.prompt.messages.0.content"}(matched via prefix)sortedAiKeys=dropPriority.filter(k => aiKeys.has(k))=[](no dropPriority entry is in aiKeys)- Nothing is deleted — the oversized result is returned as-is.
Prompt for agents
In applyTotalSizeBackstop (otlpAttributeLimits.ts), the sortedAiKeys line at line 96 uses dropPriority.filter((k) => aiKeys.has(k)), which only finds exact matches between dropPriority entries and actual attribute keys. But aiKeys was populated with actual attribute keys matched by prefix, not the prefix strings themselves.
The fix should iterate dropPriority prefixes in order, and for each prefix, find and delete all matching attribute keys from the result (using matchPrefix). Something like:
for (const prefix of dropPriority) {
for (const key of [...Object.keys(result)]) {
if (matchPrefix(key, prefix)) {
delete result[key];
}
}
if (JSON.stringify(result).length <= limits.totalAttributesLengthLimit) break;
}
This preserves the drop-priority ordering while correctly handling sub-keys (e.g. ai.prompt.messages.0.content matched by prefix ai.prompt.messages).
Was this helpful? React with 👍 or 👎 to provide feedback.
| switch (strategy.type) { | ||
| case "log": | ||
| return; |
There was a problem hiding this comment.
🚩 replicationErrorRecovery 'log' strategy silently swallows errors
In replicationErrorRecovery.server.ts:99-101, when strategy.type === 'log', the handle() method immediately returns without doing anything — including not logging the error. This is counterintuitive given the strategy name "log". By contrast, both scheduleReconnect (line 58) and scheduleExit (line 88) log the error before taking action. This may be intentional if callers are expected to log before calling handle(), but it's worth verifying the design intent since a strategy named "log" that doesn't log is surprising.
Was this helpful? React with 👍 or 👎 to provide feedback.
| import { initMollifierDrainerWorker } from "~/v3/mollifierDrainerWorker.server"; | ||
| initMollifierDrainerWorker().catch((error) => { | ||
| logger.error("Mollifier drainer initialization failed", { error }); | ||
| }); |
There was a problem hiding this comment.
🚩 entry.server.tsx imports non-existent mollifierDrainerWorker module
Lines 250-253 import initMollifierDrainerWorker from ~/v3/mollifierDrainerWorker.server, but this file does not exist in the repository at this commit. This will fail at build time / typecheck. Not reported as a bug per review guidelines (would be caught by the type checker / CI), but worth noting this PR cannot be merged as-is without the missing file being added in a follow-up commit or concurrent PR.
Was this helpful? React with 👍 or 👎 to provide feedback.
Picked up a few error handling gaps while reviewing recent releases:
Mollifier drainer init was missing a catch block - if Redis is down at startup the whole app crashes. Now logs the error and continues.
Replication error recovery was checking the shutdown flag before scheduling the reconnect timeout, but not inside the callback. Could reconnect during graceful shutdown. Added the check inside.
DynamicFlushScheduler had a closure bug - itemCount was captured from the outer map loop instead of the flush function, so metrics got wrong counts. Moved it inside. Also extracted the OTel attribute truncation logic into its own module so it's testable.