docs(claude-code): add context management guide by Sameerlite · Pull Request #220 · BerriAI/litellm-docs

Sameerlite · 2026-05-25T12:40:33Z

Summary

Adds docs/claude_code_context_management.md - full guide for the context_management feature in Claude Code via LiteLLM
Registers the new page in sidebars.js under the Claude Code category

What's in the doc

Supported edit types table (clear_tool_uses_20250919 live, two coming soon)
How it works diagram (native pass-through vs in-gateway polyfill per provider)
Usage examples - Python SDK + proxy curl
Knobs table for clear_tool_uses_20250919
Non-streaming and streaming response examples with context_management.applied_edits
How to disable via drop_params: true
Provider support matrix

Companion code PR: BerriAI/litellm#28779

Made with Cursor

Note

Low Risk
Documentation and published test-matrix data only; no runtime or auth changes in this diff.

Overview
Adds docs/claude_code_context_management.md, a Claude Code guide for Anthropic-style context_management on /v1/messages and litellm.anthropic.messages.*: native pass-through vs in-gateway polyfill by provider, supported edits (clear_tool_uses_20250919, compact_20260112), SDK/curl examples, context_management_summary_model setup, response/streaming shapes, drop_params, and a provider matrix. sidebars.js links the page under the Claude Code category.

src/data/compatibility-matrix.json flips bedrock_converse from fail to pass for basic messaging (streaming/non-streaming), tool use, and structured outputs (prior “content block is not a text block” / empty-text errors removed).

^{Reviewed by Cursor Bugbot for commit 8097c40. Bugbot is set up for automated code reviews on this repo. Configure here.}

Co-authored-by: Cursor <cursoragent@cursor.com>

vercel · 2026-05-25T12:40:38Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
litellm	Ready	Preview, Comment	May 27, 2026 12:10pm

cursor

Cursor Bugbot has reviewed your changes using high effort and found 1 potential issue.

Bugbot Autofix prepared a fix for the issue found in the latest run.

✅ Fixed: Unused Tabs and TabItem imports in documentation
- Removed the unused import Tabs from '@theme/Tabs' and import TabItem from '@theme/TabItem' lines since neither component is referenced anywhere in the document.

Preview (5a8e65d630)

diff --git a/docs/claude_code_context_management.md b/docs/claude_code_context_management.md
new file mode 100644
--- /dev/null
+++ b/docs/claude_code_context_management.md
@@ -1,0 +1,278 @@
+---
+title: Claude Code - Context Management
+sidebar_label: Claude Code - Context Management
+---
+
+# Claude Code - Context Management
+
+LiteLLM supports Anthropic's `context_management` beta natively across **all providers** - not just Anthropic.
+
+When you send a request to `/v1/messages` (or via `litellm.anthropic.messages.*`) with a `context_management` spec, LiteLLM handles it in one of two ways depending on where the request is routed:
+
+| Routing path | How context_management is applied |
+|---|---|
+| **Anthropic API** | Passed through to the Anthropic server, which applies edits natively |
+| **OpenAI Responses API** (e.g. `gpt-5.x-*`) | Passed through; handled by the Responses API |
+| **Any other provider** (OpenAI, xAI, Gemini, Azure, Bedrock non-Anthropic, …) | **In-gateway polyfill** - LiteLLM applies the edits to the message array before forwarding |
+
+The polyfill means you write your Claude Code tool-loop once, pass `context_management` as you normally would, and it works regardless of which model is behind the proxy.
+
+---
+
+## Supported Edit Types
+
+| Edit type | Status | What it does |
+|---|---|---|
+| `clear_tool_uses_20250919` | ✅ **Supported** | Clears old `tool_result` content from conversation history when a trigger threshold is met, keeping only the most recent `N` tool results intact |
+| `clear_thinking_20251015` | ❌ Coming soon | Clears extended-thinking blocks from history |
+| `compact_20260112` | ❌ Native pass-through only | Summarisation edit - supported on Anthropic / Bedrock Anthropic forwarding paths; not polyfilled |
+
+---
+
+## How It Works
+
+```
+Claude Code client
+        │
+        │  POST /v1/messages  { context_management: { edits: [...] } }
+        ▼
+┌─────────────────────────────────────────────────────────┐
+│                    LiteLLM Proxy                        │
+│                                                         │
+│  1. Detect routing target                               │
+│                                                         │
+│  ┌──────────────────────┐   ┌────────────────────────┐  │
+│  │  Anthropic / Bedrock │   │  Any other provider    │  │
+│  │  Anthropic / OpenAI  │   │  (OpenAI, xAI, Gemini, │  │
+│  │  Responses API       │   │   Azure, …)            │  │
+│  │                      │   │                        │  │
+│  │  Pass context_mgmt   │   │  In-gateway polyfill:  │  │
+│  │  spec through as-is  │   │  • Count input tokens  │  │
+│  │  (server applies it) │   │  • Check trigger       │  │
+│  └──────────┬───────────┘   │  • Clear old results   │  │
+│             │               │  • Keep N most recent  │  │
+│             │               │  • Never clear latest  │  │
+│             │               └──────────┬─────────────┘  │
+│             │                          │                 │
+│             └────────────┬─────────────┘                 │
+│                          │                               │
+│  2. Forward to provider  │                               │
+│     (without context_    │                               │
+│      management key)     │                               │
+└──────────────────────────┼──────────────────────────────┘
+                           ▼
+                    Upstream model
+                           │
+                    Response + usage
+                           │
+                           ▼
+┌─────────────────────────────────────────────────────────┐
+│  LiteLLM attaches applied_edits to response             │
+│  { context_management: { applied_edits: [...] } }       │
+└─────────────────────────────────────────────────────────┘
+                           │
+                           ▼
+                    Claude Code client
+```
+
+---
+
+## Usage
+
+### Basic request
+
+```python
+import litellm
+
+response = await litellm.anthropic.messages.acreate(
+    model="xai/grok-4",          # any provider
+    max_tokens=1024,
+    messages=[...],              # your multi-turn tool history
+    tools=[{"name": "get_weather", "description": "...", "input_schema": {...}}],
+    context_management={
+        "edits": [
+            {
+                "type": "clear_tool_uses_20250919",
+                "trigger": {
+                    "type": "input_tokens",
+                    "value": 80000          # activate when history exceeds 80k tokens
+                },
+                "keep": {
+                    "type": "tool_uses",
+                    "value": 3              # keep the 3 most-recent tool results
+                }
+            }
+        ]
+    }
+)
+```
+
+You can also trigger on tool-use count instead of tokens:
+
+```python
+"trigger": {"type": "tool_uses", "value": 10}   # activate after 10 tool calls
+```
+
+### Via the proxy (curl)
+
+```bash
+curl -X POST http://localhost:4000/v1/messages \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer $LITELLM_API_KEY" \
+  -d '{
+    "model": "gpt-5.4-mini",
+    "max_tokens": 1024,
+    "messages": [...],
+    "tools": [...],
+    "context_management": {
+      "edits": [
+        {
+          "type": "clear_tool_uses_20250919",
+          "trigger": {"type": "input_tokens", "value": 80000},
+          "keep":    {"type": "tool_uses",    "value": 3}
+        }
+      ]
+    }
+  }'
+```
+
+---
+
+## `clear_tool_uses_20250919` - Knobs
+
+| Field | Required | Default | Description |
+|---|---|---|---|
+| `trigger.type` | No | `"input_tokens"` | `"input_tokens"` or `"tool_uses"` |
+| `trigger.value` | No | `100000` | Threshold; edits fire when current value **exceeds** this |
+| `keep.type` | No | `"tool_uses"` | Must be `"tool_uses"` |
+| `keep.value` | No | `3` | Number of most-recent tool results to preserve |
+| `clear_at_least` | Accepted | - | Accepted in request but ignored by polyfill (v0) |
+| `exclude_tools` | Accepted | - | Accepted in request but ignored by polyfill (v0) |
+| `clear_tool_inputs` | Accepted | - | Accepted in request but ignored by polyfill (v0) |
+
+> **Hard floor:** regardless of `keep`, LiteLLM's polyfill never clears the most recently completed `tool_result` - the one the model is about to reply to.
+
+---
+
+## Responses
+
+### Non-streaming
+
+When at least one edit fires, the response includes a `context_management` field:
+
+```json
+{
+  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
+  "type": "message",
+  "role": "assistant",
+  "content": [{"type": "text", "text": "Based on the latest weather data..."}],
+  "model": "gpt-5.4-mini",
+  "stop_reason": "end_turn",
+  "usage": {
+    "input_tokens": 620,
+    "output_tokens": 45
+  },
+  "context_management": {
+    "applied_edits": [
+      {
+        "type": "clear_tool_uses_20250919",
+        "cleared_tool_uses": 3,
+        "cleared_input_tokens": 8240
+      }
+    ]
+  }
+}
+```
+
+If the trigger was not met (context is still small), `context_management` is **absent** from the response.
+
+### Streaming
+
+The `context_management.applied_edits` field is included in the final `message_delta` SSE event:
+
+```
+event: message_start
+data: {"type":"message_start","message":{"id":"msg_01...","type":"message","role":"assistant","content":[],"model":"gpt-5.4-mini","stop_reason":null,"usage":{"input_tokens":620,"output_tokens":0}}}
+
+event: content_block_start
+data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}
+
+event: content_block_delta
+data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Based on"}}
+
+event: content_block_delta
+data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" the latest weather data..."}}
+
+event: content_block_stop
+data: {"type":"content_block_stop","index":0}
+
+event: message_delta
+data: {
+  "type": "message_delta",
+  "delta": {"stop_reason": "end_turn", "stop_sequence": null},
+  "usage": {"output_tokens": 45},
+  "context_management": {
+    "applied_edits": [
+      {
+        "type": "clear_tool_uses_20250919",
+        "cleared_tool_uses": 3,
+        "cleared_input_tokens": 8240
+      }
+    ]
+  }
+}
+
+event: message_stop
+data: {"type":"message_stop"}
+```
+
+---
+
+## Disabling Context Management
+
+### Per-request - omit the field
+
+Simply don't include `context_management` in the request body.
+
+### Proxy-wide - `drop_params: true`
+
+When `drop_params: true` is set in your proxy config (or passed as a litellm setting), LiteLLM will silently strip `context_management` from any request instead of running the polyfill:
+
+```yaml
+# proxy_server_config.yaml
+litellm_settings:
+  drop_params: true
+```
+
+Or at call time:
+
+```python
+import litellm
+litellm.drop_params = True
+```
+
+This is useful when you have a global `drop_params` policy to suppress unsupported parameters - context management is treated like any other unsupported parameter and dropped rather than polyfilled.
+
+---
+
+## Provider Support Matrix
+
+| Provider | Native | Polyfill |
+|---|---|---|
+| `anthropic/*` | Yes | - |
+| `bedrock/anthropic.*` | `compact_20260112` only | - |
+| `openai/*` (Responses API) | Yes | - |
+| `openai/*` (chat completions) | - | Yes |
+| `azure/*` | - | Yes |
+| `xai/*` | - | Yes |
+| `gemini/*` | - | Yes |
+| `vertex_ai/*` | - | Yes |
+| All other providers | - | Yes |
+
+---
+
+## Notes
+
+- The polyfill only processes the `clear_tool_uses_20250919` edit type. `compact_20260112` requires Anthropic's summarisation capability and is forwarded as-is on native paths only.
+- Token counting for the polyfill uses `litellm.token_counter` (tiktoken `cl100k_base` fallback for unknown models).
+- The message array structure is preserved: same number of messages, same role order. Only `tool_result.content` inside matching messages is replaced with `"[Cleared by context management]"`.

diff --git a/sidebars.js b/sidebars.js
--- a/sidebars.js
+++ b/sidebars.js
@@ -154,6 +154,7 @@
             "tutorials/claude_non_anthropic_models",
             "tutorials/claude_code_plugin_marketplace",
             "tutorials/claude_code_beta_headers",
+            "claude_code_context_management",
           ]
         },
         "tutorials/claude_desktop_cowork",

_{You can send follow-ups to the cloud agent here.}

^{Reviewed by Cursor Bugbot for commit 346f1c7. Configure here.}

Updated context management documentation with new features and clarifications.

…_management.md

…e, structured outputs Co-authored-by: Cursor <cursoragent@cursor.com>

- Mark compact_20260112 as fully supported (was "native pass-through only") - Add setup instructions for context_management_summary_model config key - Document 3-phase algorithm (slice, threshold check, summarize) - Add knobs table, response shape, error handling, and curl examples - Document client-side compaction block forwarding (no edit required) - Update provider support matrix to per-edit-type columns - Update architecture diagram to show both polyfill paths Co-authored-by: Cursor <cursoragent@cursor.com>

docs(claude-code): add context management guide

346f1c7

Co-authored-by: Cursor <cursoragent@cursor.com>

vercel Bot deployed to Preview May 25, 2026 12:41 View deployment

cursor Bot reviewed May 25, 2026

View reviewed changes

Comment thread docs/claude_code_context_management.md Outdated

Sameerlite and others added 2 commits May 25, 2026 18:16

Revise context management documentation

ec923b4

Updated context management documentation with new features and clarifications.

docs: remove unused Tabs and TabItem imports from claude_code_context…

5a8e65d

…_management.md

vercel Bot deployed to Preview May 25, 2026 12:48 View deployment

Update claude_code_context_management.md

3643687

vercel Bot deployed to Preview May 25, 2026 12:58 View deployment

data(matrix): mark bedrock_converse pass for basic messaging, tool us…

55952ab

…e, structured outputs Co-authored-by: Cursor <cursoragent@cursor.com>

vercel Bot deployed to Preview May 25, 2026 13:15 View deployment

vercel Bot deployed to Preview May 27, 2026 12:10 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(claude-code): add context management guide#220

docs(claude-code): add context management guide#220
Sameerlite wants to merge 6 commits into
mainfrom
litellm_claude-code-context-management-docs

Sameerlite commented May 25, 2026 •

edited by cursor Bot

Loading

Uh oh!

vercel Bot commented May 25, 2026 •

edited

Loading

Uh oh!

cursor Bot left a comment •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Sameerlite commented May 25, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's in the doc

Uh oh!

vercel Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Sameerlite commented May 25, 2026 •

edited by cursor Bot

Loading

vercel Bot commented May 25, 2026 •

edited

Loading

cursor Bot left a comment •

edited

Loading