Skip to content

[BOT ISSUE] OpenAI embeddings spans missing output capture and incomplete input extraction #66

@braintrust-bot

Description

@braintrust-bot

Summary

The OpenAI instrumentation recognizes embeddings calls (span is named "Embeddings") but the span data is largely empty because the response and request parsing logic in InstrumentationSemConv only handles Chat Completions and Responses API field shapes.

Specifically:

  • No output_json: the response parser checks for choices (Chat Completions) or output (Responses API), but embeddings responses have data[].embedding — neither branch matches
  • Incomplete input_json: the request parser checks for messages or input (as array), but embeddings input can be a single string, which fails the isArray() check on line 102
  • Partial metrics: prompt_tokens and total_tokens are captured correctly from usage, but completion_tokens is absent (embeddings don't produce completion tokens) — this is fine but worth noting

What is missing

In InstrumentationSemConv.tagOpenAIRequest() (lines 78–108):

if (requestJson.has("messages")) {
    span.setAttribute("braintrust.input_json", toJson(requestJson.get("messages")));
} else if (requestJson.has("input") && requestJson.get("input").isArray()) {
    span.setAttribute("braintrust.input_json", toJson(requestJson.get("input")));
}
  • Embeddings requests use input which can be a string ("Hello world"), an array of strings, or an array of token arrays. Single-string inputs fail the isArray() guard and are silently dropped.

In InstrumentationSemConv.tagOpenAIResponse() (lines 111–156):

if (responseJson.has("choices")) {
    span.setAttribute("braintrust.output_json", toJson(responseJson.get("choices")));
} else if (responseJson.has("output")) {
    span.setAttribute("braintrust.output_json", toJson(responseJson.get("output")));
}
  • Embeddings responses have data (array of embedding objects) and model — neither choices nor output. No output_json is set.

Additionally, the model field from the embeddings request (e.g., text-embedding-3-small) is correctly extracted into metadata, and usage metrics partially work — so this is a detail gap rather than total failure.

Braintrust docs status

  • The Braintrust OpenAI integration docs at braintrust.dev/docs/integrations/ai-providers/openai do not mention embeddings: not_found
  • No embeddings instrumentation is documented for any provider

Upstream sources

Local files inspected

  • braintrust-sdk/src/main/java/dev/braintrust/instrumentation/InstrumentationSemConv.java — lines 78–108 (tagOpenAIRequest: input string case not handled), lines 111–156 (tagOpenAIResponse: no data field extraction), line 244 (span name correctly maps to "Embeddings")
  • braintrust-sdk/instrumentation/openai_2_8_0/src/main/java/dev/braintrust/instrumentation/openai/v2_8_0/TracingHttpClient.java — HTTP-level wrapping captures the call, delegates to InstrumentationSemConv
  • braintrust-sdk/instrumentation/openai_2_8_0/src/test/java/dev/braintrust/instrumentation/openai/v2_8_0/BraintrustOpenAITest.java — no embeddings test exists

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions