fix(engine): keep auto-compaction working on sub-500K self-hosted windows by h3c-hexin · Pull Request #2060 · Hmbown/CodeWhale

h3c-hexin · 2026-05-25T03:54:05Z

Problem

context_input_budget computes the internal input-token budget used by the preflight check, emergency context recovery, and capacity trimming. It reserved the full TURN_MAX_OUTPUT_TOKENS (262K) for output regardless of window size:

window.checked_sub(output)            // output = 262_144
      .and_then(|v| v.checked_sub(CONTEXT_HEADROOM_TOKENS))

For a self-hosted model whose window is below that reservation — e.g. vLLM serving Qwen with a 256K window — the math is 256_000 - 262_144 - 1_024, which underflows checked_sub to None.

A None budget makes every caller treat the session as having no budget to enforce, so it silently disables all preflight and emergency context-recovery paths. The session never compacts and runs until the provider hard-rejects the request on context length.

Fix

Two coupled changes:

context_window_for_model — apply the explicit _Nk suffix hint for any vendor, not just DeepSeek. A self-hosted --served-model-name like qwen3-32b-256k is the only window signal available for non-DeepSeek/Claude models; without this its window resolves to None (and budgeting is disabled for a different reason). Renamed the helper deepseek_context_window_hint → explicit_context_window_hint since it is now vendor-agnostic.
context_input_budget — tier the reserved-output term by window size:
- >= 500K (V4-class): keep the full 262K headroom — preserves the existing "leave room for interleaved thinking" contract.
- < 500K (smaller / self-hosted): reserve effective_max_output_tokens (what the API actually caps output at), which yields a usable positive budget.
Also dropped the now-vestigial requested_output_tokens parameter — every caller passed the same constant.

Scope

Behavior change is confined to the budget tiering + the window-hint vendor scope.
Signature cleanup ripples to the 3 call sites (engine.rs, capacity_flow.rs, turn_loop.rs).
No trust-boundary surface (no auth / sandbox / publishing / prompts).

Tests

internal_context_budget_tiers_reserved_output_by_window is updated to pin both branches: V4 (>=500K) still reserves 262K, and a 256K self-hosted window now yields a positive budget instead of None.

cargo test -p codewhale-tui --bins context_budget internal_context_budget

🤖 Generated with Claude Code

…dows `context_input_budget` reserved the full TURN_MAX_OUTPUT_TOKENS (262K) for output regardless of window size. For a self-hosted model with a window below that reservation (e.g. a 256K vLLM Qwen deployment) the math was `256K - 262K - 1K`, which underflows `checked_sub` to `None`. A `None` budget silently disables every preflight check and emergency context recovery path, so the session never compacts and runs until the provider hard-rejects on context length. Two coupled fixes: 1. `context_window_for_model`: apply the explicit `_Nk` suffix hint for any vendor, not just DeepSeek. A self-hosted served-model-name like `qwen3-32b-256k` is the only window signal we have for non-DeepSeek/Claude models; without this its window resolves to `None`. Renamed the helper `deepseek_context_window_hint` -> `explicit_context_window_hint` since it is now vendor-agnostic. 2. `context_input_budget`: tier the reserved-output term by window — `>= 500K` keeps the full 262K headroom (preserves the V4 interleaved- thinking contract), `< 500K` falls back to `effective_max_output_tokens` (what the API actually caps output at), yielding a usable positive budget. Dropped the vestigial `requested_output_tokens` parameter (every caller passed the same constant). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request refactors the context budget calculation to prevent underflow on models with smaller context windows, such as self-hosted 256K deployments. It introduces a 500K token threshold to tier the reserved output headroom and makes the context window hint logic vendor-agnostic by allowing any model name with an '_Nk' suffix to be recognized. Additionally, function signatures for context recovery and budget calculation were simplified by removing redundant parameters. I have no feedback to provide as there were no review comments to assess.

gemini-code-assist Bot reviewed May 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(engine): keep auto-compaction working on sub-500K self-hosted windows#2060

fix(engine): keep auto-compaction working on sub-500K self-hosted windows#2060
h3c-hexin wants to merge 1 commit into
Hmbown:mainfrom
h3c-hexin:fix/self-hosted-context-budget

h3c-hexin commented May 25, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

h3c-hexin commented May 25, 2026

Problem

Fix

Scope

Tests

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant