Skip to content

Fix error handling gaps in recent releases#3639

Closed
deepshekhardas wants to merge 7 commits into
triggerdotdev:mainfrom
deepshekhardas:fix/pr-3530-release-branch
Closed

Fix error handling gaps in recent releases#3639
deepshekhardas wants to merge 7 commits into
triggerdotdev:mainfrom
deepshekhardas:fix/pr-3530-release-branch

Conversation

@deepshekhardas
Copy link
Copy Markdown

Three bug fixes:

  1. Mollifier drainer init now has error handling - app won't crash if Redis is down
  2. Replication error recovery checks shutdown flag before reconnecting in the timeout callback
  3. OTel attribute truncation - fixed closure bug in DynamicFlushScheduler, added missing attribute limits module

ericallam and others added 7 commits May 11, 2026 10:08
Lets us ship a patch (e.g. 4.4.6) from a release/4.4.x branch without
including unreleased work merged into main, and without the patch
clobbering floating tags incorrectly.

The release-pipeline pieces this touches and how each behaves now:

  npm dist-tag        latest if version > current latest, else release-<M.m>
  Docker :v4-beta     same gate (highest version only)
  Docker :release-X.Y new per-line floating tag, always set on a semver build
  GitHub release      --latest=true|false set explicitly (no auto-detect)

How the gate is computed:
  release.yml's 'Compare new version to current latest' step queries
  npm view @trigger.dev/sdk dist-tags.latest, compares via sort -V,
  sets is_latest=true|false. Drives every floating tag.

Triggers / refs:
  - pull_request:branches[main, release/**]
  - if-conditions allow head.ref starting with 'changeset-release/'
  - workflow_dispatch ref must be reachable from main OR a release/* branch
  - changesets-pr.yml fires on push to release/** too; PR-enhance step
    discovers source branch dynamically (no more hardcoded changeset-release/main)

Other changes:
  - gh release create: drop --target main (tag carries right commit)
  - dispatch-changelog payload includes is_latest so the marketing site
    can render lagged-line releases differently
  - enhance-release-pr.mjs prepends a Release prep header on release/*
    branches showing version, current latest, and whether the PR will
    take the latest dist-tag

release-helm.yml unchanged — already creates as draft+prerelease so it
can't claim Latest. publish-worker.yml (coordinator/provider) unchanged
since those don't have a :v4-beta-equivalent floating tag.

Validated end-to-end in ericallam/pkgring-sandbox across both scenarios:
  Scenario A (lagged hotfix): latest stays put, only release-X.Y moves
  Scenario B (main has unreleased work, hotfix is highest): latest moves
- enhance-release-pr.mjs: fix dead try/catch, use proper semver comparison
- release.yml: add retry loop for npm view to prevent silent failures
  incorrectly promoting lagged hotfix to :latest
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 16, 2026

⚠️ No Changeset found

Latest commit: fccc2fc

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@github-actions
Copy link
Copy Markdown
Contributor

Hi @deepshekhardas, thanks for your interest in contributing!

This project requires that pull request authors are vouched, and you are not in the list of vouched users.

This PR will be closed automatically. See https://github.com/triggerdotdev/trigger.dev/blob/main/CONTRIBUTING.md for more details.

@github-actions github-actions Bot closed this May 16, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 16, 2026

Review Change Stack

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: f09ce1ee-93b8-4caf-91dd-e692c756c7f7

📥 Commits

Reviewing files that changed from the base of the PR and between 05d3ab1 and fccc2fc.

📒 Files selected for processing (10)
  • .github/workflows/changesets-pr.yml
  • .github/workflows/publish-webapp.yml
  • .github/workflows/publish-worker-v4.yml
  • .github/workflows/publish.yml
  • .github/workflows/release.yml
  • apps/webapp/app/entry.server.tsx
  • apps/webapp/app/services/replicationErrorRecovery.server.ts
  • apps/webapp/app/v3/dynamicFlushScheduler.server.ts
  • apps/webapp/app/v3/otlpAttributeLimits.ts
  • scripts/enhance-release-pr.mjs

Walkthrough

This PR extends release workflow infrastructure to support releases from any release/* branch (not just main) with version-based latest tagging, while adding server-side error recovery and telemetry attribute size management. Release triggers now include release/** branches, and the changesets PR lookup is parameterized from arbitrary source branches. The release job determines whether a new version should receive latest tagging by comparing its semver to npm's current dist-tags.latest, then conditionally applies v4-beta and per-line floating tags in Docker workflows. A new replication error recovery module provides configurable strategies (reconnect, exit, or log) with exponential backoff and attempt tracking, while a mollifier drainer worker initializes at server startup. Telemetry attribute overflow is handled via per-key truncation with AI-content overrides and total-size backstop drops.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
⚔️ Resolve merge conflicts
  • Resolve merge conflict in branch fix/pr-3530-release-branch

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 3 potential issues.

View 4 additional findings in Devin Review.

Open in Devin Review

}
}

const sortedAiKeys = dropPriority.filter((k) => aiKeys.has(k));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 applyTotalSizeBackstop fails to drop sub-keyed attributes, rendering the backstop ineffective

The applyTotalSizeBackstop function collects actual attribute keys into aiKeys via matchPrefix (lines 87-94), so sub-keyed attributes like "ai.prompt.messages.0" are correctly identified. However, sortedAiKeys on line 96 filters dropPriority entries by checking exact membership in aiKeys (dropPriority.filter((k) => aiKeys.has(k))). Since aiKeys contains actual attribute keys (e.g., "ai.prompt.messages.0") and NOT the prefix strings (e.g., "ai.prompt.messages"), the filter yields an empty array whenever attribute keys are sub-keyed. The subsequent delete loop on lines 98-101 then deletes nothing, and the backstop silently returns an oversized attributes map.

Concrete example showing the failure

Given attributes {"ai.prompt.messages.0": "<large>", "ai.prompt.messages.1": "<large>"} exceeding the total size limit:

  1. aiKeys = {"ai.prompt.messages.0", "ai.prompt.messages.1"} (matched via prefix "ai.prompt.messages")
  2. sortedAiKeys = dropPriority.filter(k => aiKeys.has(k)) = [] — because "ai.prompt.messages" is NOT in aiKeys
  3. Nothing is deleted, backstop returns oversized data

The truncateAttributes function (otlpAttributeLimits.ts:55-74) handles sub-keys correctly via matchPrefix, but applyTotalSizeBackstop does not.

Prompt for agents
The bug is in applyTotalSizeBackstop in apps/webapp/app/v3/otlpAttributeLimits.ts at line 96. The line `const sortedAiKeys = dropPriority.filter((k) => aiKeys.has(k))` filters drop priority prefix strings by checking exact membership in aiKeys, but aiKeys contains actual attribute keys (e.g. 'ai.prompt.messages.0'), not prefix strings (e.g. 'ai.prompt.messages'). When attributes are sub-keyed, this produces an empty array and nothing gets deleted.

The fix should iterate over the actual attribute keys in aiKeys (not the prefix strings in dropPriority), but still respect the priority ordering from dropPriority. One approach: sort the aiKeys based on which dropPriority prefix they match (using matchPrefix), then delete them in that priority order. For example:

  const sortedAiKeys = [...aiKeys].sort((a, b) => {
    const aIdx = dropPriority.findIndex(p => matchPrefix(a, p));
    const bIdx = dropPriority.findIndex(p => matchPrefix(b, p));
    return aIdx - bIdx;
  });

Then the for-of loop on the next line would iterate over actual attribute keys and delete them from result.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +100 to +101
case "log":
return;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 "log" error recovery strategy silently swallows errors without logging

The handle method in createReplicationErrorRecovery does nothing when the strategy is "log" — it just returns without logging the error (apps/webapp/app/services/replicationErrorRecovery.server.ts:100-101). Both the "reconnect" strategy (line 58) and the "exit" strategy (line 88) call logger.error(...) before taking action. The "log" strategy should at minimum log the error, otherwise replication stream errors are silently swallowed when this strategy is selected, making it impossible to diagnose issues.

Suggested change
case "log":
return;
case "log":
logger.error("Replication stream error (log-only strategy)", { error });
return;
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +250 to +253
import { initMollifierDrainerWorker } from "~/v3/mollifierDrainerWorker.server";
initMollifierDrainerWorker().catch((error) => {
logger.error("Mollifier drainer initialization failed", { error });
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Missing mollifierDrainerWorker.server module will break the webapp build

The new import at apps/webapp/app/entry.server.tsx:250 references ~/v3/mollifierDrainerWorker.server, but no file matching this path exists anywhere in the repository (verified via find and grep). This will cause a build failure. Not reported as a bug per review guidelines since it would be caught by the type checker / bundler, but the reviewer should be aware that this PR cannot build the webapp as-is.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants