fix(research): derive context budget from actual model context window by Meur3ault · Pull Request #252 · huggingface/ml-intern

Meur3ault · 2026-05-10T10:37:23Z

Problem

research_tool.py hard-coded the research sub-agent's context budget as
two module-level constants:

_RESEARCH_CONTEXT_WARN = 170_000  # 85% of 200k
_RESEARCH_CONTEXT_MAX  = 190_000

These values assumed every research model has a 200k context window.
In practice, claude-sonnet-4-6 (the default research model for
Anthropic/Bedrock sessions) has a 1,000,000-token window, so the
sub-agent was being force-stopped at 19% of its actual capacity.

Two secondary issues were also present:

The warn threshold was 85% of 200k (170k), but the system message
injected at that point reads "You have used 75% of your context
budget" — the percentage in the message didn't match the threshold
that triggered it.
For any model with a context window smaller than 190k, the hard-coded
_RESEARCH_CONTEXT_MAX would exceed the API limit, causing a
ContextWindowExceededError mid-research.

Why context depth matters for the research agent

The research sub-agent's value comes from thorough, multi-step literature
crawls: finding anchor papers, traversing citation graphs, reading
methodology sections, cross-referencing datasets, and pulling working
code from GitHub. A single deep research turn can accumulate dozens of
tool outputs — full paper sections, dataset schemas, code files — before
producing its final summary.

Cutting the sub-agent off at 190k tokens means it is routinely forced to
summarise before it has finished gathering evidence, producing shallower
findings than the model is actually capable of. With the correct 950k
budget, the sub-agent can complete its full citation crawl before being
asked to wrap up.

Fix

Move the constants inside research_handler and compute them from the
research model's actual context window, using the same
_get_max_tokens_safe helper the main session already uses:

_ctx_max = _get_max_tokens_safe(research_model)                               
_research_context_warn = int(_ctx_max * 0.75)  # matches "75%" in the injected
 prompt                                                                       
_research_context_max  = int(_ctx_max * 0.95)  # hard-stop at 95%

The warn ratio is corrected to 75% to match the existing system prompt
text. The hard-stop moves to 95% of the model's real ceiling.

Impact

Model	Old MAX	New MAX
`claude-sonnet-4-6` (default)	190,000	950,000
`claude-opus-4-7`	190,000	950,000
128k-window model	190,000 (exceeds limit)	121,600 (safe)

Tests

Added tests/unit/test_research_context_budget.py (4 cases):

_get_research_model routing for Anthropic / Bedrock / other models
research_handler calls _get_max_tokens_safe with the research
model id, not the main model id

The research sub-agent's context thresholds were hard-coded as module-level constants assuming a 200k context window: _RESEARCH_CONTEXT_WARN = 170_000 # 85% of 200k _RESEARCH_CONTEXT_MAX = 190_000 With claude-sonnet-4-6 (the default research model) having a 1M context window, the sub-agent was being terminated at 19% of its actual capacity. Also fixes two secondary issues: - Warn threshold was 85% but the injected system message said 75% — now aligned to 75% to match the prompt text. - For models with <190k context windows, the old hard-coded MAX would exceed the API limit causing ContextWindowExceededError mid-research. Move the constants inside research_handler and compute them from _get_max_tokens_safe(research_model), the same helper the main session already uses for its own compaction threshold. Adds tests/unit/test_research_context_budget.py (7 cases).

Meur3ault · 2026-05-10T10:39:23Z

Format checks have passed:
uv run ruff check . uv run ruff format --check .

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(research): derive context budget from actual model context window #252

fix(research): derive context budget from actual model context window #252
Meur3ault wants to merge 1 commit into
huggingface:mainfrom
Meur3ault:fix/research_sub-agent_budget

Meur3ault commented May 10, 2026

Uh oh!

Meur3ault commented May 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Meur3ault commented May 10, 2026

Problem

Why context depth matters for the research agent

Fix

Impact

Tests

Uh oh!

Meur3ault commented May 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant