What Happened?
When Portkey normalizes Anthropic model responses to OpenAI schema, prompt_tokens has different semantics depending on which provider is used to access the same Anthropic model.
| Provider |
prompt_tokens includes cache tokens? |
total_tokens includes cache tokens? |
anthropic (direct API) |
No |
Yes |
| vertex-ai (Anthropic models) |
No |
Yes |
| bedrock (Anthropic models) |
Yes |
Yes |
For the same Anthropic model (e.g. claude-sonnet-4-20250514), sending the same prompt with cache:
- Anthropic direct / Vertex AI:
prompt_tokens = 100 (non-cached input only), cache_read_input_tokens = 50, completion_tokens = 100, total_tokens = 250
- Bedrock:
prompt_tokens = 150 (includes cached input), cache_read_input_tokens = 50, completion_tokens = 100, total_tokens = 250
Anthropic direct and Vertex AI set prompt_tokens = input_tokens (excludes cache). Bedrock sets prompt_tokens = inputTokens + cacheReadInputTokens + cacheWriteInputTokens (includes cache).
What Should Have Happened?
All three Anthropic access paths should normalize prompt_tokens consistently. The OpenAI convention (which Portkey normalizes to) is that prompt_tokens includes cached tokens, with the breakdown available in prompt_tokens_details.cached_tokens. Anthropic direct and Vertex AI should match Bedrock's behavior.
Relevant Code Snippet
anthropic/chatComplete.ts#L612C9-L627C11
usage: {
prompt_tokens: input_tokens,
completion_tokens: output_tokens,
total_tokens:
input_tokens +
output_tokens +
(cache_creation_input_tokens ?? 0) +
(cache_read_input_tokens ?? 0),
prompt_tokens_details: {
cached_tokens: cache_read_input_tokens ?? 0,
},
...(shouldSendCacheUsage && {
cache_read_input_tokens: cache_read_input_tokens,
cache_creation_input_tokens: cache_creation_input_tokens,
}),
google-vertex-ai/chatComplete.ts#L898C6-L909C9
usage: {
prompt_tokens: input_tokens,
completion_tokens: output_tokens,
total_tokens: totalTokens,
prompt_tokens_details: {
cached_tokens: cache_read_input_tokens,
},
...(shouldSendCacheUsage && {
cache_read_input_tokens: cache_read_input_tokens,
cache_creation_input_tokens: cache_creation_input_tokens,
}),
},
bedrock/chatComplete.ts#L550C4-L565C9
usage: {
prompt_tokens:
response.usage.inputTokens +
cacheReadInputTokens +
cacheWriteInputTokens,
completion_tokens: response.usage.outputTokens,
total_tokens: response.usage.totalTokens, // contains the cache usage as well
prompt_tokens_details: {
cached_tokens: cacheReadInputTokens,
},
// we only want to be sending this for anthropic models and this is not openai compliant
...((cacheReadInputTokens > 0 || cacheWriteInputTokens > 0) && {
cache_read_input_tokens: cacheReadInputTokens,
cache_creation_input_tokens: cacheWriteInputTokens,
}),
},
Your Twitter/LinkedIn
No response
What Happened?
When Portkey normalizes Anthropic model responses to OpenAI schema,
prompt_tokenshas different semantics depending on which provider is used to access the same Anthropic model.prompt_tokensincludes cache tokens?total_tokensincludes cache tokens?anthropic(direct API)For the same Anthropic model (e.g.
claude-sonnet-4-20250514), sending the same prompt with cache:prompt_tokens= 100 (non-cached input only),cache_read_input_tokens= 50,completion_tokens= 100,total_tokens= 250prompt_tokens= 150 (includes cached input),cache_read_input_tokens= 50,completion_tokens= 100,total_tokens= 250Anthropic direct and Vertex AI set
prompt_tokens = input_tokens(excludes cache). Bedrock setsprompt_tokens = inputTokens + cacheReadInputTokens + cacheWriteInputTokens(includes cache).What Should Have Happened?
All three Anthropic access paths should normalize
prompt_tokensconsistently. The OpenAI convention (which Portkey normalizes to) is thatprompt_tokensincludes cached tokens, with the breakdown available inprompt_tokens_details.cached_tokens. Anthropic direct and Vertex AI should match Bedrock's behavior.Relevant Code Snippet
anthropic/chatComplete.ts#L612C9-L627C11
google-vertex-ai/chatComplete.ts#L898C6-L909C9
bedrock/chatComplete.ts#L550C4-L565C9
Your Twitter/LinkedIn
No response