-
Notifications
You must be signed in to change notification settings - Fork 2
[BOT ISSUE] OpenAI Responses API streaming spans lose output and metrics #62
Description
Summary
The OpenAI instrumentation's SSE stream reassembly path hardcodes ChatCompletionAccumulator / ChatCompletionChunk, which is specific to the Chat Completions API. When a user calls client.responses().createStreaming(...), the Responses API emits a different set of SSE event types (response.created, response.output_text.delta, response.completed, etc.) that cannot be deserialized as ChatCompletionChunk. As a result, streaming Responses API spans end up with no output data and no usage metrics.
Non-streaming Responses API calls (client.responses().create(...)) work correctly — the InstrumentationSemConv class already handles input/output/input_tokens/output_tokens fields.
What is missing
TracingHttpClient.tagSpanFromSseBytes() (lines 205–231) needs a parallel code path that:
- Detects Responses API SSE events (which may start with
event:lines likeevent: response.createdbefore thedata:line, unlike Chat Completions which only hasdata:lines). - Uses the OpenAI Java SDK's
ResponseAccumulator(analogous toChatCompletionAccumulator) to reassembleResponseStreamEventchunks into a completeResponseobject. - Passes the assembled response JSON through the existing
InstrumentationSemConv.tagOpenAIResponse()which already knows how to extractoutput,input_tokens,output_tokens, andreasoning_tokensfrom Responses API payloads.
Failure mode
Today when Responses API streaming is used:
- If the first non-empty SSE line is
event: response.created(notdata:), the code at line 176 falls through to the plain-JSON branch, which tries to parse the entire SSE byte stream as JSON → parse error → span has no output/metrics. - Even if a
data:line happened to come first, line 218–219 would attemptBraintrustJsonMapper.get().readValue(data, ChatCompletionChunk.class)on a Responses API event object → deserialization error → span has no output/metrics.
In both cases the error is caught and logged, but the span is silently incomplete.
Braintrust docs status
- The Java SDK README and Braintrust docs do not explicitly document Responses API streaming support: not_found
- Non-streaming Responses API is handled in code but not documented either.
Upstream sources
- OpenAI Responses API streaming docs: https://platform.openai.com/docs/guides/streaming-responses
- OpenAI Java SDK
ResponseAccumulator: available incom.openai:openai-java(the SDK already depended on at[2.8.0,)), analogous toChatCompletionAccumulator - OpenAI Java SDK streaming example: https://github.com/openai/openai-java —
ResponsesStructuredOutputsStreamingExample.java
Local files inspected
braintrust-sdk/instrumentation/openai_2_8_0/src/main/java/dev/braintrust/instrumentation/openai/v2_8_0/TracingHttpClient.java— lines 205–231 (tagSpanFromSseByteshardcodesChatCompletionAccumulator)braintrust-sdk/src/main/java/dev/braintrust/instrumentation/InstrumentationSemConv.java— lines 99–104, 116–150 (correctly handles Responses API fields for non-streaming)braintrust-sdk/instrumentation/openai_2_8_0/src/test/java/.../BraintrustOpenAITest.java— hastestWrapOpenAiResponses(non-streaming only, no streaming Responses API test)