-
Notifications
You must be signed in to change notification settings - Fork 2
[BOT ISSUE] OpenAI embeddings spans missing output capture and incomplete input extraction #66
Copy link
Copy link
Open
Description
Summary
The OpenAI instrumentation recognizes embeddings calls (span is named "Embeddings") but the span data is largely empty because the response and request parsing logic in InstrumentationSemConv only handles Chat Completions and Responses API field shapes.
Specifically:
- No
output_json: the response parser checks forchoices(Chat Completions) oroutput(Responses API), but embeddings responses havedata[].embedding— neither branch matches - Incomplete
input_json: the request parser checks formessagesorinput(as array), but embeddingsinputcan be a single string, which fails theisArray()check on line 102 - Partial metrics:
prompt_tokensandtotal_tokensare captured correctly fromusage, butcompletion_tokensis absent (embeddings don't produce completion tokens) — this is fine but worth noting
What is missing
In InstrumentationSemConv.tagOpenAIRequest() (lines 78–108):
if (requestJson.has("messages")) {
span.setAttribute("braintrust.input_json", toJson(requestJson.get("messages")));
} else if (requestJson.has("input") && requestJson.get("input").isArray()) {
span.setAttribute("braintrust.input_json", toJson(requestJson.get("input")));
}- Embeddings requests use
inputwhich can be a string ("Hello world"), an array of strings, or an array of token arrays. Single-string inputs fail theisArray()guard and are silently dropped.
In InstrumentationSemConv.tagOpenAIResponse() (lines 111–156):
if (responseJson.has("choices")) {
span.setAttribute("braintrust.output_json", toJson(responseJson.get("choices")));
} else if (responseJson.has("output")) {
span.setAttribute("braintrust.output_json", toJson(responseJson.get("output")));
}- Embeddings responses have
data(array of embedding objects) andmodel— neitherchoicesnoroutput. Nooutput_jsonis set.
Additionally, the model field from the embeddings request (e.g., text-embedding-3-small) is correctly extracted into metadata, and usage metrics partially work — so this is a detail gap rather than total failure.
Braintrust docs status
- The Braintrust OpenAI integration docs at
braintrust.dev/docs/integrations/ai-providers/openaido not mention embeddings: not_found - No embeddings instrumentation is documented for any provider
Upstream sources
- OpenAI Embeddings API: https://platform.openai.com/docs/api-reference/embeddings/create — documents
input(string or array), responsedata[].embedding, andusagewithprompt_tokens/total_tokens - OpenAI Java SDK:
client.embeddings().create()is a stable, documented API surface
Local files inspected
braintrust-sdk/src/main/java/dev/braintrust/instrumentation/InstrumentationSemConv.java— lines 78–108 (tagOpenAIRequest:inputstring case not handled), lines 111–156 (tagOpenAIResponse: nodatafield extraction), line 244 (span name correctly maps to"Embeddings")braintrust-sdk/instrumentation/openai_2_8_0/src/main/java/dev/braintrust/instrumentation/openai/v2_8_0/TracingHttpClient.java— HTTP-level wrapping captures the call, delegates toInstrumentationSemConvbraintrust-sdk/instrumentation/openai_2_8_0/src/test/java/dev/braintrust/instrumentation/openai/v2_8_0/BraintrustOpenAITest.java— no embeddings test exists
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels