Add simple nextjs request logger plus fix OTEL resource attributes#3247
Add simple nextjs request logger plus fix OTEL resource attributes#3247ChristopherChudzicki merged 10 commits intomainfrom
Conversation
OpenAPI ChangesNo changes detected Unexpected changes? Ensure your branch is up-to-date with |
86a9369 to
172cc56
Compare
There was a problem hiding this comment.
Pull request overview
Adds server-side OpenTelemetry helpers to (1) override OTEL resource attributes based on runtime env vars and (2) optionally emit one structured JSON log line per completed Next.js request span, improving trace metadata consistency and enabling lightweight request timing logs.
Changes:
- Added
otel-utilshelpers to parse OTEL env overrides, apply them to span resources, and format request span logs. - Wired new span processors into
instrumentation-node.tsfor resource overrides and optional request logging. - Documented OTEL exporter endpoint options and request logging env var in
env/frontend.env, and added unit tests for the new utilities.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| frontends/main/src/otel-utils.ts | New helpers for parsing OTEL env overrides, applying resource attributes, and building request log entries from server spans. |
| frontends/main/src/otel-utils.test.ts | Unit tests covering override parsing/applying and request log entry creation. |
| frontends/main/src/instrumentation-node.ts | Registers new span processors (resource override + optional request logging) alongside existing OTLP/console exporters. |
| env/frontend.env | Updates local/prod OTEL endpoint documentation and adds NEXT_SERVER_REQUEST_LOGGING documentation. |
| * ignores OTEL_RESOURCE_ATTRIBUTES on its internal resource, so without this | ||
| * the spans we ship to Alloy/Tempo would be misattributed. | ||
| * | ||
| * See https://github.com/getsentry/sentry-javascript/issues/20502 |
There was a problem hiding this comment.
This seems likely to be fixed soon, see
though i'd still like to have the data before their fix is merged/released.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replaces the custom OTEL_RESOURCE_ATTRIBUTES parser with @opentelemetry/resources' envDetector, which is spec-compliant (handles percent-decoding, length checks, OTEL_SERVICE_NAME merging). Injects service.version into OTEL_RESOURCE_ATTRIBUTES at startup so NEXT_PUBLIC_VERSION flows through the SDK rather than being applied by custom override code. The override SpanProcessor remains because Sentry hardcodes service.name to "node" — see getsentry/sentry-javascript#20502. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Logs and traces often live in different stores; inline version is more durable than relying on log shipper labels (which can drop during rolling deploys or label collisions). Read NEXT_PUBLIC_VERSION once at module load. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
So tests that depend on the value behave consistently across machines and CI rather than inheriting whatever happens to be in the shell. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Move per-request structured logging off the OTEL span pipeline so that log emission is independent of OTEL_TRACES_SAMPLER_ARG. This lets the logs serve as ground truth for OTEL trace coverage — a null traceId in the log now means "OTEL never created a span for this request" rather than "the sampler dropped it." Subscribes to Node's built-in http.server.request.start / http.server.response.finish diagnostics channels (Experimental in Node 24/25, but the same surface Sentry/OTEL/Datadog use internally). traceId/spanId are read best-effort from the active OTEL context. Enabled by default; set NEXT_SERVER_REQUEST_LOGGING=false to disable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Defensive flag on globalThis prevents stacked subscriptions if the instrumentation hook is ever re-evaluated (dev reloads, worker restarts). In normal operation the module body runs once per process, but the guard is cheap insurance against doubled log lines. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Skip /_next/*, /__nextjs_*, and /favicon.ico in the structured request log. These never get OTEL traces (Sentry's HttpInstrumentation already filters them) so they're noise for the OTEL-coverage diagnostic, and in prod they mostly hit the CDN. RSC fetches go to real route paths (e.g. /courses?_rsc=...) and remain logged. Also split the URL into separate route and query fields. The route groups cleanly while the query stays available for filtering RSC vs non-RSC requests via the _rsc parameter. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
663e366 to
14ab0aa
Compare
| * checks, and merges OTEL_SERVICE_NAME into service.name. | ||
| */ | ||
| export function detectResourceOverrides(): DetectedResourceAttributes { | ||
| return envDetector.detect().attributes ?? {} |
There was a problem hiding this comment.
Bug: The async function detectResourceOverrides may be called without await, which could lead to a race condition.
Severity: MEDIUM
Suggested Fix
Review all call sites of the detectResourceOverrides function and ensure that the returned Promise is properly handled, typically by adding the await keyword before the function call.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.
Location: frontends/main/src/otel-utils.ts#L41
Potential issue: The function `detectResourceOverrides` is declared as `async`, meaning
it returns a Promise. If a caller does not use the `await` keyword when invoking this
function, the program will not wait for the promise to resolve. This can lead to a race
condition where code that depends on the completion of `detectResourceOverrides`
executes prematurely. Given the function's name, this could result in improperly
configured OpenTelemetry resources, affecting observability and monitoring.
There was a problem hiding this comment.
The function is not async. It returns a plain object, Record<string, DetectedResourceAttributeValue>
What are the relevant tickets?
For https://github.com/mitodl/hq/issues/10971
Description (What does it do?)
This PR:
{ "message": "next_request", "method": "GET", "route": "/search", "statusCode": 200, "durationMs": 382, "traceId": "0e3948306c4b73bd45b1cc4f9aef5366", "spanId": "aae162babca4e255", "name": "GET /search", "version": "test-version" }OTEL_SERVICE_NAME,OTEL_RESOURCE_ATTRIBUTES, andservice.versionfromNEXT_PUBLIC_VERSION.service.nameis hardcoded to'node'by Sentry, which makes NextJS otel data in Grafana hard to find.How can this be tested?
NEXT_SERVER_REQUEST_LOGGING):{"message":"next_request","method":"GET","route":"/","statusCode":200,"durationMs":4094,"traceId":"9f7fe6b4d389911acf30630799898d0a","spanId":"04d8e441c1d98b9c","name":"GET, "version": "test-version" /"}Additional Context
OTEL_SERVICE_NAMEorOTEL_RESOURCE_ATTRIBUTESby default.