Skip to content

[BOT ISSUE] OpenAI Responses API streaming spans lose output and metrics #62

@braintrust-bot

Description

@braintrust-bot

Summary

The OpenAI instrumentation's SSE stream reassembly path hardcodes ChatCompletionAccumulator / ChatCompletionChunk, which is specific to the Chat Completions API. When a user calls client.responses().createStreaming(...), the Responses API emits a different set of SSE event types (response.created, response.output_text.delta, response.completed, etc.) that cannot be deserialized as ChatCompletionChunk. As a result, streaming Responses API spans end up with no output data and no usage metrics.

Non-streaming Responses API calls (client.responses().create(...)) work correctly — the InstrumentationSemConv class already handles input/output/input_tokens/output_tokens fields.

What is missing

TracingHttpClient.tagSpanFromSseBytes() (lines 205–231) needs a parallel code path that:

  1. Detects Responses API SSE events (which may start with event: lines like event: response.created before the data: line, unlike Chat Completions which only has data: lines).
  2. Uses the OpenAI Java SDK's ResponseAccumulator (analogous to ChatCompletionAccumulator) to reassemble ResponseStreamEvent chunks into a complete Response object.
  3. Passes the assembled response JSON through the existing InstrumentationSemConv.tagOpenAIResponse() which already knows how to extract output, input_tokens, output_tokens, and reasoning_tokens from Responses API payloads.

Failure mode

Today when Responses API streaming is used:

  • If the first non-empty SSE line is event: response.created (not data:), the code at line 176 falls through to the plain-JSON branch, which tries to parse the entire SSE byte stream as JSON → parse error → span has no output/metrics.
  • Even if a data: line happened to come first, line 218–219 would attempt BraintrustJsonMapper.get().readValue(data, ChatCompletionChunk.class) on a Responses API event object → deserialization error → span has no output/metrics.

In both cases the error is caught and logged, but the span is silently incomplete.

Braintrust docs status

  • The Java SDK README and Braintrust docs do not explicitly document Responses API streaming support: not_found
  • Non-streaming Responses API is handled in code but not documented either.

Upstream sources

Local files inspected

  • braintrust-sdk/instrumentation/openai_2_8_0/src/main/java/dev/braintrust/instrumentation/openai/v2_8_0/TracingHttpClient.java — lines 205–231 (tagSpanFromSseBytes hardcodes ChatCompletionAccumulator)
  • braintrust-sdk/src/main/java/dev/braintrust/instrumentation/InstrumentationSemConv.java — lines 99–104, 116–150 (correctly handles Responses API fields for non-streaming)
  • braintrust-sdk/instrumentation/openai_2_8_0/src/test/java/.../BraintrustOpenAITest.java — has testWrapOpenAiResponses (non-streaming only, no streaming Responses API test)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions