Skip to content

[BOT ISSUE] Google GenAI embedContent spans lack embedding-specific input, metrics, and span type #65

@braintrust-bot

Description

@braintrust-bot

Summary

When client.models.embedContent() is called through the instrumented Google GenAI client, the call is captured at the HTTP level but the span contains almost no useful embedding-specific detail. The tagSpan() method in BraintrustApiClient only extracts fields relevant to generateContent (like contents, generationConfig, usageMetadata), which are absent from embedding requests and responses.

The span is created with the correct operation name (embed_content) but has:

  • Empty input_json: only {"model": "..."} — the actual content being embedded, taskType, title, and outputDimensionality are not extracted
  • No metrics: embedding responses use metadata.billableCharacterCount instead of usageMetadata, so no token/character counts are captured
  • Incorrect span type: marked as type: "llm" rather than type: "embedding" (or equivalent)

The full response is stored in output_json as a raw dump, so the embedding vector data is technically present but not meaningfully structured.

What is missing

In BraintrustApiClient.tagSpan() (lines 50–161):

Request parsing (lines 97–112):

  • Checks for contents (generateContent field) but embedContent uses content (singular)
  • Checks for generationConfig but embedContent uses taskType, title, outputDimensionality
  • Result: input_json only contains {"model": "..."} for embedding calls

Response parsing (lines 116–149):

  • Checks for usageMetadata with promptTokenCount/candidatesTokenCount but embedContent responses have metadata with billableCharacterCount
  • Result: no metrics are captured

Span attributes (line 156):

  • Hardcodes type: "llm" for all calls including embeddings

Braintrust docs status

  • The Braintrust Gemini integration docs at braintrust.dev/docs/integrations/ai-providers/gemini do not mention embeddings: not_found
  • No embeddings instrumentation is documented for any provider in Java

Upstream sources

  • Google GenAI embeddings docs: https://ai.google.dev/gemini-api/docs/embeddings — documents embedContent as a stable, first-class API with models like gemini-embedding-001
  • Google GenAI Java SDK: client.models.embedContent() is available with EmbedContentConfig (taskType, title, outputDimensionality) and returns EmbedContentResponse with embeddings and metadata
  • embedContent request format: uses content (singular), taskType, title, outputDimensionality — none of which match the generateContent fields currently extracted
  • embedContent response format: returns embedding.values array and metadata.billableCharacterCount — not usageMetadata

Local files inspected

  • braintrust-sdk/instrumentation/genai_1_18_0/src/main/java/com/google/genai/BraintrustApiClient.java — lines 50–161 (tagSpan only extracts generateContent-relevant fields), lines 325–333 (getOperation correctly parses embedContent to embed_content)
  • braintrust-sdk/instrumentation/genai_1_18_0/src/test/java/dev/braintrust/instrumentation/genai/v1_18_0/BraintrustGenAITest.java — no embedContent test exists
  • braintrust-sdk/instrumentation/genai_1_18_0/src/main/java/com/google/genai/BraintrustInstrumentation.java — wraps ApiClient generically, no embedding-specific logic

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions