Skip to content

fix(research): derive context budget from actual model context window #252

Open
Meur3ault wants to merge 1 commit into
huggingface:mainfrom
Meur3ault:fix/research_sub-agent_budget
Open

fix(research): derive context budget from actual model context window #252
Meur3ault wants to merge 1 commit into
huggingface:mainfrom
Meur3ault:fix/research_sub-agent_budget

Conversation

@Meur3ault
Copy link
Copy Markdown

Problem

research_tool.py hard-coded the research sub-agent's context budget as
two module-level constants:

_RESEARCH_CONTEXT_WARN = 170_000  # 85% of 200k
_RESEARCH_CONTEXT_MAX  = 190_000            

These values assumed every research model has a 200k context window.
In practice, claude-sonnet-4-6 (the default research model for
Anthropic/Bedrock sessions) has a 1,000,000-token window, so the
sub-agent was being force-stopped at 19% of its actual capacity.

Two secondary issues were also present:

  • The warn threshold was 85% of 200k (170k), but the system message
    injected at that point reads "You have used 75% of your context
    budget"
    — the percentage in the message didn't match the threshold
    that triggered it.
  • For any model with a context window smaller than 190k, the hard-coded
    _RESEARCH_CONTEXT_MAX would exceed the API limit, causing a
    ContextWindowExceededError mid-research.

Why context depth matters for the research agent

The research sub-agent's value comes from thorough, multi-step literature
crawls: finding anchor papers, traversing citation graphs, reading
methodology sections, cross-referencing datasets, and pulling working
code from GitHub. A single deep research turn can accumulate dozens of
tool outputs — full paper sections, dataset schemas, code files — before
producing its final summary.

Cutting the sub-agent off at 190k tokens means it is routinely forced to
summarise before it has finished gathering evidence, producing shallower
findings than the model is actually capable of. With the correct 950k
budget, the sub-agent can complete its full citation crawl before being
asked to wrap up.

Fix

Move the constants inside research_handler and compute them from the
research model's actual context window, using the same
_get_max_tokens_safe helper the main session already uses:

_ctx_max = _get_max_tokens_safe(research_model)                               
_research_context_warn = int(_ctx_max * 0.75)  # matches "75%" in the injected
 prompt                                                                       
_research_context_max  = int(_ctx_max * 0.95)  # hard-stop at 95%

The warn ratio is corrected to 75% to match the existing system prompt
text. The hard-stop moves to 95% of the model's real ceiling.

Impact

Model Old MAX New MAX
claude-sonnet-4-6 (default) 190,000 950,000
claude-opus-4-7 190,000 950,000
128k-window model 190,000 (exceeds limit) 121,600 (safe)

Tests

Added tests/unit/test_research_context_budget.py (4 cases):

  • _get_research_model routing for Anthropic / Bedrock / other models
  • research_handler calls _get_max_tokens_safe with the research
    model id, not the main model id

The research sub-agent's context thresholds were hard-coded as module-level
constants assuming a 200k context window:

    _RESEARCH_CONTEXT_WARN = 170_000  # 85% of 200k
    _RESEARCH_CONTEXT_MAX  = 190_000

With claude-sonnet-4-6 (the default research model) having a 1M context
window, the sub-agent was being terminated at 19% of its actual capacity.

Also fixes two secondary issues:
- Warn threshold was 85% but the injected system message said 75% —
  now aligned to 75% to match the prompt text.
- For models with <190k context windows, the old hard-coded MAX would
  exceed the API limit causing ContextWindowExceededError mid-research.

Move the constants inside research_handler and compute them from
_get_max_tokens_safe(research_model), the same helper the main session
already uses for its own compaction threshold.

Adds tests/unit/test_research_context_budget.py (7 cases).
@Meur3ault
Copy link
Copy Markdown
Author

Format checks have passed:
uv run ruff check . uv run ruff format --check .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant