Skip to content

[BOT ISSUE] OpenAI Responses API input_tokens_details.cached_tokens not captured in metrics #70

@braintrust-bot

Description

@braintrust-bot

Summary

The OpenAI Responses API response handler in InstrumentationSemConv.tagOpenAIResponse() extracts output_tokens_details.reasoning_tokens as completion_reasoning_tokens (lines 144–150), but does not extract the sibling field input_tokens_details.cached_tokens. This means prompt caching usage on Responses API calls is invisible in Braintrust metrics, even though the code already demonstrates the pattern for extracting nested token details at this level.

This is distinct from #58 (which covers Chat Completions prompt_tokens_details.cached_tokens and completion_tokens_details.reasoning_tokens) — the Responses API uses different field names (input_tokens_details/output_tokens_details instead of prompt_tokens_details/completion_tokens_details) and a different code path.

What is missing

In InstrumentationSemConv.tagOpenAIResponse() (lines 144–150), only output_tokens_details is checked:

// Reasoning tokens (Responses API)
if (usage.has("output_tokens_details")) {
    JsonNode details = usage.get("output_tokens_details");
    if (details.has("reasoning_tokens")) {
        metrics.put("completion_reasoning_tokens", details.get("reasoning_tokens"));
    }
}

The missing extraction:

if (usage.has("input_tokens_details")) {
    JsonNode details = usage.get("input_tokens_details");
    if (details.has("cached_tokens")) {
        metrics.put("prompt_cached_tokens", details.get("cached_tokens"));
    }
}

A real Responses API usage object with prompt caching looks like:

{
  "input_tokens": 9708,
  "output_tokens": 167,
  "total_tokens": 9875,
  "input_tokens_details": {
    "cached_tokens": 5578
  },
  "output_tokens_details": {
    "reasoning_tokens": 0
  }
}

Today, reasoning_tokens is captured but cached_tokens is silently dropped.

Braintrust docs status

  • The Braintrust OpenAI integration docs at https://www.braintrust.dev/docs/integrations/ai-providers/openai do not specifically mention cached token metrics for the Responses API: not_found
  • The Gemini integration docs capture prompt_cached_tokens as a named metric, suggesting this is a recognized metric name in the Braintrust ecosystem

Upstream sources

  • OpenAI Responses API: The usage object includes input_tokens_details.cached_tokens — confirmed in community discussion and OpenAI prompt caching docs
  • OpenAI Java SDK: The Response object's Usage class exposes inputTokensDetails() with cachedTokens()

Local files inspected

  • braintrust-sdk/src/main/java/dev/braintrust/instrumentation/InstrumentationSemConv.java — lines 135–150 (tagOpenAIResponse Responses API usage handling; output_tokens_details.reasoning_tokens extracted at line 148, no input_tokens_details check)
  • braintrust-sdk/instrumentation/openai_2_8_0/src/test/java/dev/braintrust/instrumentation/openai/v2_8_0/BraintrustOpenAITest.javatestWrapOpenAiResponses does not assert cached token metrics
  • braintrust-sdk/instrumentation/genai_1_18_0/src/main/java/com/google/genai/BraintrustApiClient.java — line 145 shows prompt_cached_tokens is an established metric name (used for Gemini's cachedContentTokenCount)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions