docs(claude-code): add context management guide#220
Open
Sameerlite wants to merge 6 commits into
Open
Conversation
Co-authored-by: Cursor <cursoragent@cursor.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using high effort and found 1 potential issue.
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Unused Tabs and TabItem imports in documentation
- Removed the unused
import Tabs from '@theme/Tabs'andimport TabItem from '@theme/TabItem'lines since neither component is referenced anywhere in the document.
- Removed the unused
Preview (5a8e65d630)
diff --git a/docs/claude_code_context_management.md b/docs/claude_code_context_management.md
new file mode 100644
--- /dev/null
+++ b/docs/claude_code_context_management.md
@@ -1,0 +1,278 @@
+---
+title: Claude Code - Context Management
+sidebar_label: Claude Code - Context Management
+---
+
+# Claude Code - Context Management
+
+LiteLLM supports Anthropic's `context_management` beta natively across **all providers** - not just Anthropic.
+
+When you send a request to `/v1/messages` (or via `litellm.anthropic.messages.*`) with a `context_management` spec, LiteLLM handles it in one of two ways depending on where the request is routed:
+
+| Routing path | How context_management is applied |
+|---|---|
+| **Anthropic API** | Passed through to the Anthropic server, which applies edits natively |
+| **OpenAI Responses API** (e.g. `gpt-5.x-*`) | Passed through; handled by the Responses API |
+| **Any other provider** (OpenAI, xAI, Gemini, Azure, Bedrock non-Anthropic, …) | **In-gateway polyfill** - LiteLLM applies the edits to the message array before forwarding |
+
+The polyfill means you write your Claude Code tool-loop once, pass `context_management` as you normally would, and it works regardless of which model is behind the proxy.
+
+---
+
+## Supported Edit Types
+
+| Edit type | Status | What it does |
+|---|---|---|
+| `clear_tool_uses_20250919` | ✅ **Supported** | Clears old `tool_result` content from conversation history when a trigger threshold is met, keeping only the most recent `N` tool results intact |
+| `clear_thinking_20251015` | ❌ Coming soon | Clears extended-thinking blocks from history |
+| `compact_20260112` | ❌ Native pass-through only | Summarisation edit - supported on Anthropic / Bedrock Anthropic forwarding paths; not polyfilled |
+
+---
+
+## How It Works
+
+```
+Claude Code client
+ │
+ │ POST /v1/messages { context_management: { edits: [...] } }
+ ▼
+┌─────────────────────────────────────────────────────────┐
+│ LiteLLM Proxy │
+│ │
+│ 1. Detect routing target │
+│ │
+│ ┌──────────────────────┐ ┌────────────────────────┐ │
+│ │ Anthropic / Bedrock │ │ Any other provider │ │
+│ │ Anthropic / OpenAI │ │ (OpenAI, xAI, Gemini, │ │
+│ │ Responses API │ │ Azure, …) │ │
+│ │ │ │ │ │
+│ │ Pass context_mgmt │ │ In-gateway polyfill: │ │
+│ │ spec through as-is │ │ • Count input tokens │ │
+│ │ (server applies it) │ │ • Check trigger │ │
+│ └──────────┬───────────┘ │ • Clear old results │ │
+│ │ │ • Keep N most recent │ │
+│ │ │ • Never clear latest │ │
+│ │ └──────────┬─────────────┘ │
+│ │ │ │
+│ └────────────┬─────────────┘ │
+│ │ │
+│ 2. Forward to provider │ │
+│ (without context_ │ │
+│ management key) │ │
+└──────────────────────────┼──────────────────────────────┘
+ ▼
+ Upstream model
+ │
+ Response + usage
+ │
+ ▼
+┌─────────────────────────────────────────────────────────┐
+│ LiteLLM attaches applied_edits to response │
+│ { context_management: { applied_edits: [...] } } │
+└─────────────────────────────────────────────────────────┘
+ │
+ ▼
+ Claude Code client
+```
+
+---
+
+## Usage
+
+### Basic request
+
+```python
+import litellm
+
+response = await litellm.anthropic.messages.acreate(
+ model="xai/grok-4", # any provider
+ max_tokens=1024,
+ messages=[...], # your multi-turn tool history
+ tools=[{"name": "get_weather", "description": "...", "input_schema": {...}}],
+ context_management={
+ "edits": [
+ {
+ "type": "clear_tool_uses_20250919",
+ "trigger": {
+ "type": "input_tokens",
+ "value": 80000 # activate when history exceeds 80k tokens
+ },
+ "keep": {
+ "type": "tool_uses",
+ "value": 3 # keep the 3 most-recent tool results
+ }
+ }
+ ]
+ }
+)
+```
+
+You can also trigger on tool-use count instead of tokens:
+
+```python
+"trigger": {"type": "tool_uses", "value": 10} # activate after 10 tool calls
+```
+
+### Via the proxy (curl)
+
+```bash
+curl -X POST http://localhost:4000/v1/messages \
+ -H "Content-Type: application/json" \
+ -H "Authorization: Bearer $LITELLM_API_KEY" \
+ -d '{
+ "model": "gpt-5.4-mini",
+ "max_tokens": 1024,
+ "messages": [...],
+ "tools": [...],
+ "context_management": {
+ "edits": [
+ {
+ "type": "clear_tool_uses_20250919",
+ "trigger": {"type": "input_tokens", "value": 80000},
+ "keep": {"type": "tool_uses", "value": 3}
+ }
+ ]
+ }
+ }'
+```
+
+---
+
+## `clear_tool_uses_20250919` - Knobs
+
+| Field | Required | Default | Description |
+|---|---|---|---|
+| `trigger.type` | No | `"input_tokens"` | `"input_tokens"` or `"tool_uses"` |
+| `trigger.value` | No | `100000` | Threshold; edits fire when current value **exceeds** this |
+| `keep.type` | No | `"tool_uses"` | Must be `"tool_uses"` |
+| `keep.value` | No | `3` | Number of most-recent tool results to preserve |
+| `clear_at_least` | Accepted | - | Accepted in request but ignored by polyfill (v0) |
+| `exclude_tools` | Accepted | - | Accepted in request but ignored by polyfill (v0) |
+| `clear_tool_inputs` | Accepted | - | Accepted in request but ignored by polyfill (v0) |
+
+> **Hard floor:** regardless of `keep`, LiteLLM's polyfill never clears the most recently completed `tool_result` - the one the model is about to reply to.
+
+---
+
+## Responses
+
+### Non-streaming
+
+When at least one edit fires, the response includes a `context_management` field:
+
+```json
+{
+ "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
+ "type": "message",
+ "role": "assistant",
+ "content": [{"type": "text", "text": "Based on the latest weather data..."}],
+ "model": "gpt-5.4-mini",
+ "stop_reason": "end_turn",
+ "usage": {
+ "input_tokens": 620,
+ "output_tokens": 45
+ },
+ "context_management": {
+ "applied_edits": [
+ {
+ "type": "clear_tool_uses_20250919",
+ "cleared_tool_uses": 3,
+ "cleared_input_tokens": 8240
+ }
+ ]
+ }
+}
+```
+
+If the trigger was not met (context is still small), `context_management` is **absent** from the response.
+
+### Streaming
+
+The `context_management.applied_edits` field is included in the final `message_delta` SSE event:
+
+```
+event: message_start
+data: {"type":"message_start","message":{"id":"msg_01...","type":"message","role":"assistant","content":[],"model":"gpt-5.4-mini","stop_reason":null,"usage":{"input_tokens":620,"output_tokens":0}}}
+
+event: content_block_start
+data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}
+
+event: content_block_delta
+data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Based on"}}
+
+event: content_block_delta
+data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" the latest weather data..."}}
+
+event: content_block_stop
+data: {"type":"content_block_stop","index":0}
+
+event: message_delta
+data: {
+ "type": "message_delta",
+ "delta": {"stop_reason": "end_turn", "stop_sequence": null},
+ "usage": {"output_tokens": 45},
+ "context_management": {
+ "applied_edits": [
+ {
+ "type": "clear_tool_uses_20250919",
+ "cleared_tool_uses": 3,
+ "cleared_input_tokens": 8240
+ }
+ ]
+ }
+}
+
+event: message_stop
+data: {"type":"message_stop"}
+```
+
+---
+
+## Disabling Context Management
+
+### Per-request - omit the field
+
+Simply don't include `context_management` in the request body.
+
+### Proxy-wide - `drop_params: true`
+
+When `drop_params: true` is set in your proxy config (or passed as a litellm setting), LiteLLM will silently strip `context_management` from any request instead of running the polyfill:
+
+```yaml
+# proxy_server_config.yaml
+litellm_settings:
+ drop_params: true
+```
+
+Or at call time:
+
+```python
+import litellm
+litellm.drop_params = True
+```
+
+This is useful when you have a global `drop_params` policy to suppress unsupported parameters - context management is treated like any other unsupported parameter and dropped rather than polyfilled.
+
+---
+
+## Provider Support Matrix
+
+| Provider | Native | Polyfill |
+|---|---|---|
+| `anthropic/*` | Yes | - |
+| `bedrock/anthropic.*` | `compact_20260112` only | - |
+| `openai/*` (Responses API) | Yes | - |
+| `openai/*` (chat completions) | - | Yes |
+| `azure/*` | - | Yes |
+| `xai/*` | - | Yes |
+| `gemini/*` | - | Yes |
+| `vertex_ai/*` | - | Yes |
+| All other providers | - | Yes |
+
+---
+
+## Notes
+
+- The polyfill only processes the `clear_tool_uses_20250919` edit type. `compact_20260112` requires Anthropic's summarisation capability and is forwarded as-is on native paths only.
+- Token counting for the polyfill uses `litellm.token_counter` (tiktoken `cl100k_base` fallback for unknown models).
+- The message array structure is preserved: same number of messages, same role order. Only `tool_result.content` inside matching messages is replaced with `"[Cleared by context management]"`.
diff --git a/sidebars.js b/sidebars.js
--- a/sidebars.js
+++ b/sidebars.js
@@ -154,6 +154,7 @@
"tutorials/claude_non_anthropic_models",
"tutorials/claude_code_plugin_marketplace",
"tutorials/claude_code_beta_headers",
+ "claude_code_context_management",
]
},
"tutorials/claude_desktop_cowork",You can send follow-ups to the cloud agent here.
Reviewed by Cursor Bugbot for commit 346f1c7. Configure here.
Updated context management documentation with new features and clarifications.
…e, structured outputs Co-authored-by: Cursor <cursoragent@cursor.com>
- Mark compact_20260112 as fully supported (was "native pass-through only") - Add setup instructions for context_management_summary_model config key - Document 3-phase algorithm (slice, threshold check, summarize) - Add knobs table, response shape, error handling, and curl examples - Document client-side compaction block forwarding (no edit required) - Update provider support matrix to per-edit-type columns - Update architecture diagram to show both polyfill paths Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
docs/claude_code_context_management.md- full guide for thecontext_managementfeature in Claude Code via LiteLLMsidebars.jsunder the Claude Code categoryWhat's in the doc
clear_tool_uses_20250919live, two coming soon)clear_tool_uses_20250919context_management.applied_editsdrop_params: trueCompanion code PR: BerriAI/litellm#28779
Made with Cursor
Note
Low Risk
Documentation and published test-matrix data only; no runtime or auth changes in this diff.
Overview
Adds
docs/claude_code_context_management.md, a Claude Code guide for Anthropic-stylecontext_managementon/v1/messagesandlitellm.anthropic.messages.*: native pass-through vs in-gateway polyfill by provider, supported edits (clear_tool_uses_20250919,compact_20260112), SDK/curl examples,context_management_summary_modelsetup, response/streaming shapes,drop_params, and a provider matrix.sidebars.jslinks the page under the Claude Code category.src/data/compatibility-matrix.jsonflipsbedrock_conversefrom fail to pass for basic messaging (streaming/non-streaming), tool use, and structured outputs (prior “content block is not a text block” / empty-text errors removed).Reviewed by Cursor Bugbot for commit 8097c40. Bugbot is set up for automated code reviews on this repo. Configure here.