Skip to content

Session context: cap raw tool-output replay and keep details behind handles #2021

@Hmbown

Description

@Hmbown

Problem

Large raw tool outputs in session history can turn an environment/log-volume problem into an apparent model-quality problem. When the parent prompt is dominated by huge command output, the model has less room for decision-quality context and may look confused even if the core model is not the root cause.

Evidence from maintainer-private local CodeWhale session logs, scanned 2026-05-24:

  • 170 function-call outputs reported more than 10,000 original tokens.
  • The largest observed tool output reported about 800,893 original tokens.
  • 302 outputs were already marked as truncated or included very large line-count summaries.
  • 1,427 token-count events had last-turn input above 55% of the 258,400-token context window.
  • The largest observed last-turn input was 243,329 tokens out of 258,400.

No prompts, raw tool outputs, secrets, absolute local paths, or user text are copied here. These are aggregate counters from local logs.

Desired Behavior

The model transcript should receive bounded summaries and durable handles, not repeated raw output dumps:

  • Hard-cap raw tool output inserted into api_messages, saved session history, and resumed context.
  • Store full raw output behind a local detail handle when needed.
  • Preserve a concise receipt: command/tool name, status, exit code, elapsed time, output byte/token estimate, truncation reason, and retrieval handle.
  • Prefer "summary first, raw detail behind handle" in Activity Detail, handoff, and continuation prompts.
  • Warn or auto-suggest compaction when a turn approaches a configurable context-pressure threshold.

Acceptance Criteria

  • A synthetic tool output above the cap is not replayed verbatim into future model input.
  • The user can still inspect/copy/retrieve the full output through a detail pager or handle.
  • Session resume reconstructs a useful receipt without loading the full raw output by default.
  • /status or an equivalent diagnostic reports context pressure from large tool outputs.
  • Regression tests cover persisted session history, resumed API messages, and detail-handle retrieval for large outputs.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingcompactionContext management / compactioncontextContext management / contextenhancementNew feature or request

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions