Skip to content

fix(webapp): cap AI SDK OTel attribute size so ClickHouse JSON parse doesn't drop spans#3620

Draft
0ski wants to merge 1 commit into
mainfrom
oskar/fix-otel-attribute-truncation
Draft

fix(webapp): cap AI SDK OTel attribute size so ClickHouse JSON parse doesn't drop spans#3620
0ski wants to merge 1 commit into
mainfrom
oskar/fix-otel-attribute-truncation

Conversation

@0ski
Copy link
Copy Markdown
Collaborator

@0ski 0ski commented May 14, 2026

Tighten OTel span attribute truncation for Vercel AI SDK content keys
(ai.prompt*, ai.response.text/object/toolCalls/reasoning*,
gen_ai.prompt, gen_ai.completion, gen_ai.request.messages,
gen_ai.response.text) to a 1KB per-attribute cap, plus a 32KB per-span
backstop that drops these content keys in priority order if the assembled
attributes JSON still exceeds it. Cost/token metadata (ai.usage.*,
ai.model.*, gen_ai.usage.*, gen_ai.response.model, etc.) keeps the
default 8KB cap so LLM enrichment continues to work.

Adds a parse-error-aware safety net in DynamicFlushScheduler: when
ClickHouse rejects a batch with Cannot parse JSON object here, the
batch is split in half and retried (up to 4 split levels / 16-way
isolation) instead of failing all 5–10k rows at once. Singleton rows
that still fail are logged with a 1KB sample and dropped so the rest of
the queue keeps flowing.

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 14, 2026

⚠️ No Changeset found

Latest commit: adcff4d

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 14, 2026

Review Change Stack

Walkthrough

This PR implements multi-level OpenTelemetry attribute controls and batch resilience: three env vars (default per-attribute 8KB, AI-content per-attribute 1KB, and total-per-span 32KB) are added and used to build a SpanAttributeLimits object. A new otlpAttributeLimits module provides surrogate-safe truncateAttributes and capAssembledAttributesSize (which drops AI-content keys by priority until the assembled JSON fits). OTLP exporter is refactored to accept and apply these limits across traces, logs, and metrics. DynamicFlushScheduler detects ClickHouse JSON parse errors and uses bounded recursive split-and-retry (preserving split depth) to isolate bad rows; irrecoverable rows are logged with a sampled 1KB snippet and counted in droppedRows. Tests validate truncation, overrides, and priority-based dropping.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 7.69% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: capping AI SDK OTel attribute size to prevent ClickHouse JSON parse failures that drop spans.
Description check ✅ Passed The description addresses most required template sections: testing checklist is partially filled, detailed changelog describes the attribute truncation and batch-splitting changes. However, testing section lacks concrete steps and screenshots are missing.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch oskar/fix-otel-attribute-truncation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@apps/webapp/app/v3/dynamicFlushScheduler.server.ts`:
- Around line 21-23: At split exhaustion the code currently falls through and
retries unparseable payloads without decrementing totalQueuedItems; update the
batching logic around MAX_SPLIT_DEPTH/splitDepth in
dynamicFlushScheduler.server.ts so that when splitDepth === MAX_SPLIT_DEPTH and
subBatchSize > 1 you treat the leaf like the singleton-drop branch: decrement
totalQueuedItems by subBatchSize, increment failedBatches, log the dropped
sub-batch, and return (i.e., mirror the behavior in the subBatchSize === 1
branch) to avoid leaking queue count and wasted retries; also correct the
MAX_SPLIT_DEPTH comment to state it is the maximum split depth (yielding up to
2^MAX_SPLIT_DEPTH-way splits) rather than claiming it isolates single rows.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 21d321ea-89c2-45ba-869b-3a1788c27904

📥 Commits

Reviewing files that changed from the base of the PR and between a97365d and e08d939.

📒 Files selected for processing (6)
  • .server-changes/otel-ai-sdk-attribute-truncation.md
  • apps/webapp/app/env.server.ts
  • apps/webapp/app/v3/dynamicFlushScheduler.server.ts
  • apps/webapp/app/v3/otlpAttributeLimits.ts
  • apps/webapp/app/v3/otlpExporter.server.ts
  • apps/webapp/test/otlpAttributeLimits.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (5, 8)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (4, 8)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (3, 8)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (8, 8)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (7, 8)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (6, 8)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (1, 8)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (2, 8)
  • GitHub Check: typecheck / typecheck
  • GitHub Check: e2e-webapp / 🧪 E2E Tests: Webapp
  • GitHub Check: Analyze (javascript-typescript)
🧰 Additional context used
📓 Path-based instructions (13)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead

**/*.{ts,tsx}: Import from @trigger.dev/core subpaths only, never from the root. Subpath imports must be used to maintain proper module boundaries.
When writing Trigger.dev tasks, always import from @trigger.dev/sdk. Never use @trigger.dev/sdk/v3 or deprecated client.defineJob.
Prisma is version 6.14.0. Use the Prisma client from internal-packages/database for all database operations.
For ClickHouse client, schema migrations, and analytics queries, use internal-packages/clickhouse.

Files:

  • apps/webapp/app/env.server.ts
  • apps/webapp/test/otlpAttributeLimits.test.ts
  • apps/webapp/app/v3/dynamicFlushScheduler.server.ts
  • apps/webapp/app/v3/otlpAttributeLimits.ts
  • apps/webapp/app/v3/otlpExporter.server.ts
{packages/core,apps/webapp}/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use zod for validation in packages/core and apps/webapp

Files:

  • apps/webapp/app/env.server.ts
  • apps/webapp/test/otlpAttributeLimits.test.ts
  • apps/webapp/app/v3/dynamicFlushScheduler.server.ts
  • apps/webapp/app/v3/otlpAttributeLimits.ts
  • apps/webapp/app/v3/otlpExporter.server.ts
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use function declarations instead of default exports

Add crumbs as you write code — not just when debugging. Mark lines with // @Crumbs or wrap blocks in `// `#region` `@crumbs. They stay on the branch throughout development and are stripped by agentcrumbs strip before merge.

Files:

  • apps/webapp/app/env.server.ts
  • apps/webapp/test/otlpAttributeLimits.test.ts
  • apps/webapp/app/v3/dynamicFlushScheduler.server.ts
  • apps/webapp/app/v3/otlpAttributeLimits.ts
  • apps/webapp/app/v3/otlpExporter.server.ts
**/*.ts

📄 CodeRabbit inference engine (.cursor/rules/otel-metrics.mdc)

**/*.ts: When creating or editing OTEL metrics (counters, histograms, gauges), ensure metric attributes have low cardinality by using only enums, booleans, bounded error codes, or bounded shard IDs
Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)
When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries

Files:

  • apps/webapp/app/env.server.ts
  • apps/webapp/test/otlpAttributeLimits.test.ts
  • apps/webapp/app/v3/dynamicFlushScheduler.server.ts
  • apps/webapp/app/v3/otlpAttributeLimits.ts
  • apps/webapp/app/v3/otlpExporter.server.ts
apps/webapp/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)

apps/webapp/**/*.{ts,tsx}: Access environment variables through the env export of env.server.ts instead of directly accessing process.env
Use subpath exports from @trigger.dev/core package instead of importing from the root @trigger.dev/core path

Use named constants for sentinel/placeholder values (e.g. const UNSET_VALUE = '__unset__') instead of raw string literals scattered across comparisons

Files:

  • apps/webapp/app/env.server.ts
  • apps/webapp/test/otlpAttributeLimits.test.ts
  • apps/webapp/app/v3/dynamicFlushScheduler.server.ts
  • apps/webapp/app/v3/otlpAttributeLimits.ts
  • apps/webapp/app/v3/otlpExporter.server.ts
apps/webapp/**/*.server.ts

📄 CodeRabbit inference engine (apps/webapp/CLAUDE.md)

apps/webapp/**/*.server.ts: Never use request.signal for detecting client disconnects. Use getRequestAbortSignal() from app/services/httpAsyncStorage.server.ts instead, which is wired directly to Express res.on('close') and fires reliably
Access environment variables via env export from app/env.server.ts. Never use process.env directly
Always use findFirst instead of findUnique in Prisma queries. findUnique has an implicit DataLoader that batches concurrent calls and has active bugs even in Prisma 6.x (uppercase UUIDs returning null, composite key SQL correctness issues, 5-10x worse performance). findFirst is never batched and avoids this entire class of issues

Files:

  • apps/webapp/app/env.server.ts
  • apps/webapp/app/v3/dynamicFlushScheduler.server.ts
  • apps/webapp/app/v3/otlpExporter.server.ts
**/*.{ts,tsx,js,jsx,json,md,css,scss}

📄 CodeRabbit inference engine (AGENTS.md)

Code formatting is enforced using Prettier. Run pnpm run format before committing

Files:

  • apps/webapp/app/env.server.ts
  • apps/webapp/test/otlpAttributeLimits.test.ts
  • apps/webapp/app/v3/dynamicFlushScheduler.server.ts
  • apps/webapp/app/v3/otlpAttributeLimits.ts
  • apps/webapp/app/v3/otlpExporter.server.ts
apps/**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (CLAUDE.md)

When modifying only server components (apps/webapp/, apps/supervisor/, etc.) with no package changes, add a .server-changes/ file instead of a changeset. See .server-changes/README.md for format and documentation.

Files:

  • apps/webapp/app/env.server.ts
  • apps/webapp/test/otlpAttributeLimits.test.ts
  • apps/webapp/app/v3/dynamicFlushScheduler.server.ts
  • apps/webapp/app/v3/otlpAttributeLimits.ts
  • apps/webapp/app/v3/otlpExporter.server.ts
**/*.{test,spec}.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use vitest for all tests in the Trigger.dev repository

Files:

  • apps/webapp/test/otlpAttributeLimits.test.ts
apps/webapp/**/*.test.{ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)

Do not import env.server.ts directly or indirectly into test files; instead pass environment-dependent values through options/parameters to make code testable

For testable code, never import env.server.ts in test files. Pass configuration as options instead (e.g., realtimeClient.server.ts takes config as constructor arg, realtimeClientGlobal.server.ts creates singleton with env config)

Files:

  • apps/webapp/test/otlpAttributeLimits.test.ts
**/*.test.{ts,tsx,js}

📄 CodeRabbit inference engine (AGENTS.md)

**/*.test.{ts,tsx,js}: Use vitest for unit testing and run tests with pnpm run test
Test files should live beside the files under test with descriptive describe and it blocks
Tests should avoid mocks or stubs and use helpers from @internal/testcontainers when Redis or Postgres are needed

Files:

  • apps/webapp/test/otlpAttributeLimits.test.ts
**/*.test.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.test.{ts,tsx,js,jsx}: Use vitest exclusively for testing and never mock anything - use testcontainers instead.
Place test files next to source files (e.g., MyService.ts -> MyService.test.ts).

Files:

  • apps/webapp/test/otlpAttributeLimits.test.ts
apps/webapp/app/v3/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

In the apps/webapp/app/v3/ directory, only modify V2 code paths. All new work uses Run Engine 2.0 (@internal/run-engine) and redis-worker. The directory name is misleading - most code is actively used by V2, not legacy V1. Refer to apps/webapp/CLAUDE.md for the exact list of V1-only legacy files.

Files:

  • apps/webapp/app/v3/dynamicFlushScheduler.server.ts
  • apps/webapp/app/v3/otlpAttributeLimits.ts
  • apps/webapp/app/v3/otlpExporter.server.ts
🧠 Learnings (7)
📚 Learning: 2026-03-22T13:26:12.060Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3244
File: apps/webapp/app/components/code/TextEditor.tsx:81-86
Timestamp: 2026-03-22T13:26:12.060Z
Learning: In the triggerdotdev/trigger.dev codebase, do not flag `navigator.clipboard.writeText(...)` calls for `missing-await`/`unhandled-promise` issues. These clipboard writes are intentionally invoked without `await` and without `catch` handlers across the project; keep that behavior consistent when reviewing TypeScript/TSX files (e.g., usages like in `apps/webapp/app/components/code/TextEditor.tsx`).

Applied to files:

  • apps/webapp/app/env.server.ts
  • apps/webapp/test/otlpAttributeLimits.test.ts
  • apps/webapp/app/v3/dynamicFlushScheduler.server.ts
  • apps/webapp/app/v3/otlpAttributeLimits.ts
  • apps/webapp/app/v3/otlpExporter.server.ts
📚 Learning: 2026-03-22T19:24:14.403Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 3187
File: apps/webapp/app/v3/services/alerts/deliverErrorGroupAlert.server.ts:200-204
Timestamp: 2026-03-22T19:24:14.403Z
Learning: In the triggerdotdev/trigger.dev codebase, webhook URLs are not expected to contain embedded credentials/secrets (e.g., fields like `ProjectAlertWebhookProperties` should only hold credential-free webhook endpoints). During code review, if you see logging or inclusion of raw webhook URLs in error messages, do not automatically treat it as a credential-leak/secrets-in-logs issue by default—first verify the URL does not contain embedded credentials (for example, no username/password in the URL, no obvious secret/token query params or fragments). If the URL is credential-free per this project’s conventions, allow the logging.

Applied to files:

  • apps/webapp/app/env.server.ts
  • apps/webapp/test/otlpAttributeLimits.test.ts
  • apps/webapp/app/v3/dynamicFlushScheduler.server.ts
  • apps/webapp/app/v3/otlpAttributeLimits.ts
  • apps/webapp/app/v3/otlpExporter.server.ts
📚 Learning: 2026-05-05T09:38:02.512Z
Learnt from: d-cs
Repo: triggerdotdev/trigger.dev PR: 3523
File: apps/webapp/app/routes/api.v3.batches.ts:178-181
Timestamp: 2026-05-05T09:38:02.512Z
Learning: When reviewing code that catches `ServiceValidationError` in `*.server.ts` files, do not blindly forward `error.status` to HTTP responses, because SVEs may be thrown with non-default statuses (e.g., 400/500) and forwarding them can cause client-visible behavioral regressions (e.g., surfacing 500s to clients). Prefer a safe default response status of `error.status ?? 422`, but only after confirming via the reachable call graph that the caught `ServiceValidationError` instances are expected to carry those non-default statuses; otherwise, normalize to `422` to avoid unexpected client-visible 5xx behavior.

Applied to files:

  • apps/webapp/app/env.server.ts
  • apps/webapp/app/v3/dynamicFlushScheduler.server.ts
  • apps/webapp/app/v3/otlpExporter.server.ts
📚 Learning: 2026-05-12T21:04:05.815Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3542
File: apps/webapp/app/components/sessions/v1/SessionStatus.tsx:1-3
Timestamp: 2026-05-12T21:04:05.815Z
Learning: In this Remix + TypeScript codebase, do not flag a server/client boundary violation when a file imports only types from a module matching `*.server`.

Specifically, it’s safe to import types using `import type { Foo } from "*.server"` or `import { type Foo } from "*.server"` because TypeScript erases type-only imports at compile time and they emit no JavaScript, so they won’t cross the Remix server/client bundle boundary.

Only raise the boundary concern for value imports (e.g., `import { Foo }` without `type`, or `import Foo`), since those produce JavaScript output.

Applied to files:

  • apps/webapp/app/env.server.ts
  • apps/webapp/test/otlpAttributeLimits.test.ts
  • apps/webapp/app/v3/dynamicFlushScheduler.server.ts
  • apps/webapp/app/v3/otlpAttributeLimits.ts
  • apps/webapp/app/v3/otlpExporter.server.ts
📚 Learning: 2026-05-07T12:25:18.271Z
Learnt from: d-cs
Repo: triggerdotdev/trigger.dev PR: 3531
File: apps/webapp/test/sentryTraceContext.server.test.ts:9-47
Timestamp: 2026-05-07T12:25:18.271Z
Learning: In the triggerdotdev/trigger.dev webapp test suite, it is acceptable to leave `createInMemoryTracing()` calls that register a global `NodeTracerProvider` without `afterEach`/`afterAll` teardown. Do not flag this as a test-ordering risk when the code follows the established pattern used across webapp tests (e.g., replication service/benchmark/backfiller tests). This is considered safe because `trace.getActiveSpan()` when called outside a `context.with(...)` block reads `AsyncLocalStorage.getStore()` (undefined when no `run()` scope exists), so it falls back to `ROOT_CONTEXT` with no attached span—regardless of which provider is registered.

Applied to files:

  • apps/webapp/test/otlpAttributeLimits.test.ts
📚 Learning: 2026-03-29T19:16:28.864Z
Learnt from: nicktrn
Repo: triggerdotdev/trigger.dev PR: 3291
File: apps/webapp/app/v3/featureFlags.ts:53-65
Timestamp: 2026-03-29T19:16:28.864Z
Learning: When reviewing TypeScript code that uses Zod v3, treat `z.coerce.*()` schemas as their direct Zod type (e.g., `z.coerce.boolean()` returns a `ZodBoolean` with `_def.typeName === "ZodBoolean"`) rather than a `ZodEffects`. Only `.preprocess()`, `.refine()`/`.superRefine()`, and `.transform()` are expected to wrap schemas in `ZodEffects`. Therefore, in reviewers’ logic like `getFlagControlType`, do not flag/unblock failures that require unwrapping `ZodEffects` when the input schema is a `z.coerce.*` schema.

Applied to files:

  • apps/webapp/app/v3/dynamicFlushScheduler.server.ts
  • apps/webapp/app/v3/otlpAttributeLimits.ts
  • apps/webapp/app/v3/otlpExporter.server.ts
📚 Learning: 2026-05-14T08:21:07.614Z
Learnt from: d-cs
Repo: triggerdotdev/trigger.dev PR: 3614
File: apps/webapp/app/v3/mollifier/mollifierGate.server.ts:48-52
Timestamp: 2026-05-14T08:21:07.614Z
Learning: When using Trigger.dev v3 feature flags in the webapp, prefer the existing per-org gating mechanism supported by `flag()` via the `overrides` argument. Pass `Organization.featureFlags` (from `environment.organization.featureFlags`) as the `overrides` value; overrides must take precedence over the global `featureFlag` row. Do not require schema changes or add an `orgId` field to `FlagsOptions` for per-org gating—use the overrides pattern consistently (e.g., in gate flows like `resolveOrgFlag` and any server code that threads `environment.organization.featureFlags` into the gate call).

Applied to files:

  • apps/webapp/app/v3/dynamicFlushScheduler.server.ts
  • apps/webapp/app/v3/otlpExporter.server.ts
🔇 Additional comments (9)
.server-changes/otel-ai-sdk-attribute-truncation.md (1)

1-20: LGTM!

apps/webapp/app/env.server.ts (1)

501-510: LGTM!

apps/webapp/app/v3/otlpAttributeLimits.ts (1)

1-207: LGTM!

apps/webapp/app/v3/otlpExporter.server.ts (5)

44-49: LGTM!

Also applies to: 60-60, 71-71, 91-91, 112-112


259-259: LGTM!

Also applies to: 265-281, 302-316


364-364: LGTM!

Also applies to: 370-386, 409-423


486-486: LGTM!


1131-1143: LGTM!

apps/webapp/test/otlpAttributeLimits.test.ts (1)

1-194: LGTM!

Comment thread apps/webapp/app/v3/dynamicFlushScheduler.server.ts Outdated
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 6 additional findings.

Open in Devin Review

…doesn't drop spans

Vercel AI SDK spans carry tens of KB of prompt/response content per attribute,
producing an assembled attributes JSON that ClickHouse rejects with
"Cannot parse JSON object" and silently drops the whole batch.

- Add otlpAttributeLimits module with per-key overrides for ai.* / gen_ai.*
  content keys (tighter 1KB cap) plus a 32KB total-attributes backstop that
  drops AI content keys in priority order when exceeded.
- Wire SERVER_OTEL_SPAN_ATTRIBUTE_VALUE_LENGTH_LIMIT /
  SERVER_OTEL_AI_CONTENT_ATTRIBUTE_VALUE_LENGTH_LIMIT /
  SERVER_OTEL_SPAN_TOTAL_ATTRIBUTES_LENGTH_LIMIT env vars through the OTLP
  exporter for spans and logs.
- DynamicFlushScheduler now recognises ClickHouse JSON parse errors and
  recursively splits the failing batch (up to depth 8, 256-way isolation)
  to narrow the bad row instead of poisoning the whole 5-10k-row batch.
  Leaves that can't be salvaged — single rows ClickHouse still rejects, or
  split-exhausted chunks — are counted in a new droppedRows metric and
  removed from the queue so totalQueuedItems doesn't leak.
@0ski 0ski force-pushed the oskar/fix-otel-attribute-truncation branch from e08d939 to adcff4d Compare May 14, 2026 15:34
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.server-changes/otel-ai-sdk-attribute-truncation.md:
- Around line 19-20: Update the changelog sentence that currently reads "Leaves
that still fail — whether a single row or a split-exhausted chunk — are
logged..." by replacing "Leaves" with "Rows" so it reads "Rows that still fail —
whether a single row or a split-exhausted chunk — are logged..."; locate the
exact phrase "Leaves that still fail" in the
.server-changes/otel-ai-sdk-attribute-truncation.md diff and perform the
single-word correction to fix the wording typo.
- Around line 17-18: Update the changelog text that currently reads "up to 8
split levels / 256-way isolation" to the correct behavior "up to 4 split levels
/ 16-way isolation" so the wording matches the scheduler limits; locate and
replace that exact phrase in the .md content (search for the string "up to 8
split levels / 256-way isolation") and ensure the surrounding sentence remains
grammatically correct.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: ecf3c36d-a7a1-47e0-94d0-f723a7bd5162

📥 Commits

Reviewing files that changed from the base of the PR and between e08d939 and adcff4d.

📒 Files selected for processing (6)
  • .server-changes/otel-ai-sdk-attribute-truncation.md
  • apps/webapp/app/env.server.ts
  • apps/webapp/app/v3/dynamicFlushScheduler.server.ts
  • apps/webapp/app/v3/otlpAttributeLimits.ts
  • apps/webapp/app/v3/otlpExporter.server.ts
  • apps/webapp/test/otlpAttributeLimits.test.ts
🚧 Files skipped from review as they are similar to previous changes (5)
  • apps/webapp/test/otlpAttributeLimits.test.ts
  • apps/webapp/app/v3/dynamicFlushScheduler.server.ts
  • apps/webapp/app/env.server.ts
  • apps/webapp/app/v3/otlpExporter.server.ts
  • apps/webapp/app/v3/otlpAttributeLimits.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (8, 8)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (1, 8)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (4, 8)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (6, 8)
  • GitHub Check: typecheck / typecheck
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-05-14T14:54:39.095Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3545
File: .server-changes/agent-view-sessions.md:10-10
Timestamp: 2026-05-14T14:54:39.095Z
Learning: In the `trigger.dev` repository, do not flag inconsistent dot vs slash notation in route/path strings inside `.server-changes/*.md` files. These markdown files are consumed verbatim into the changelog, so the mixed notation (e.g., `resources.orgs.../runs.$runParam/...`) is intentional and should be preserved as-is.

Applied to files:

  • .server-changes/otel-ai-sdk-attribute-truncation.md

Comment on lines +17 to +18
batch is split in half and retried (up to 8 split levels / 256-way
isolation) instead of failing all 5–10k rows at once. Leaves that
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Correct split-depth numbers to match implemented behavior.

Line 17 says up to 8 split levels / 256-way isolation, but this PR’s described behavior is up to 4 split levels / 16-way isolation. Please align this changelog text with the actual scheduler limits to avoid operational confusion.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.server-changes/otel-ai-sdk-attribute-truncation.md around lines 17 - 18,
Update the changelog text that currently reads "up to 8 split levels / 256-way
isolation" to the correct behavior "up to 4 split levels / 16-way isolation" so
the wording matches the scheduler limits; locate and replace that exact phrase
in the .md content (search for the string "up to 8 split levels / 256-way
isolation") and ensure the surrounding sentence remains grammatically correct.

Comment on lines +19 to +20
still fail — whether a single row or a split-exhausted chunk — are
logged with a 1KB sample, counted in `droppedRows`, and removed from
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix wording typo in failure-row sentence.

Line 19 should read Rows that still fail ... (not Leaves that still fail ...) for clear changelog wording.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.server-changes/otel-ai-sdk-attribute-truncation.md around lines 19 - 20,
Update the changelog sentence that currently reads "Leaves that still fail —
whether a single row or a split-exhausted chunk — are logged..." by replacing
"Leaves" with "Rows" so it reads "Rows that still fail — whether a single row or
a split-exhausted chunk — are logged..."; locate the exact phrase "Leaves that
still fail" in the .server-changes/otel-ai-sdk-attribute-truncation.md diff and
perform the single-word correction to fix the wording typo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant