-
Notifications
You must be signed in to change notification settings - Fork 1
[BOT ISSUE] OpenAI: chat completions streaming drops audio delta chunks (GPT-4o audio modality) #201
Description
Summary
When using GPT-4o's audio output modality (modalities=["text", "audio"]), the streaming aggregation in ChatCompletionWrapper._postprocess_streaming_results silently drops all audio delta data from the span output. Non-streaming calls capture the full response including audio, creating an inconsistency.
What is missing
The _postprocess_streaming_results method (py/src/braintrust/oai.py, lines 288–357) only processes these delta fields:
| Delta field | Captured? |
|---|---|
delta.role |
Yes |
delta.content |
Yes |
delta.tool_calls |
Yes |
delta.finish_reason |
Yes |
delta.audio |
No |
When OpenAI streams a chat completion with audio output, chunks include delta.audio with:
delta.audio.id— audio clip identifierdelta.audio.transcript— text transcript of the generated speechdelta.audio.data— base64-encoded audio bytesdelta.audio.expires_at— expiration timestamp
None of these fields are aggregated. The final span output contains no trace of the audio response.
Non-streaming is fine: the non-streaming path (lines 199–205) logs output=log_response["choices"], which includes the full audio field from the response. Only streaming is affected.
Relationship to existing issues
- [BOT ISSUE] OpenAI: chat completions streaming discards
logprobsfrom chunks #180 coverslogprobsbeing dropped in streaming (same method, same root cause pattern) - [BOT ISSUE] OpenAI: chat completions streaming drops
refusaldelta text from span output #181 coversrefusalbeing dropped in streaming (same method, same root cause pattern) - OpenAI: Audio API (
client.audio.speech,transcriptions,translations) not instrumented #174 covers the directclient.audio.*APIs (separate gap — this issue is about audio output in chat completions)
The audio field is distinct from the other missing fields because it carries substantial binary data (audio bytes) and a text transcript that users would want captured for observability.
Braintrust docs status
not_found — The OpenAI integration page does not mention audio modality output in chat completions.
Upstream sources
- OpenAI audio output guide: https://platform.openai.com/docs/guides/audio
- OpenAI chat completions streaming format:
choices[0].delta.audioobject - GPT-4o audio is GA — supports
gpt-4o-audio-previewandgpt-4o-mini-audio-previewmodels - OpenAI Python SDK
ChatCompletionChunk.Choice.Delta.audiofield
Local files inspected
py/src/braintrust/oai.py:ChatCompletionWrapper._postprocess_streaming_results(lines 288–357) — only handlesrole,content,tool_calls,finish_reason; line 353 hardcodes"logprobs": Nonebut doesn't even mentionaudio- Non-streaming path (lines 199–205, 255–261) — logs full
choicesincludingaudiofield
py/src/braintrust/wrappers/test_openai.py— no test cases for audio modality streaming