Skip to content

Feature Request: Auto-compact / Context Condensing for Long Conversations #173

@bianchenhao

Description

@bianchenhao

Problem

When using btw_app() for extended iterative development sessions (e.g., writing code, running tests, fixing bugs in a loop), the input token count steadily grows — primarily due to accumulated tool call inputs/outputs (file reads, code search results, R execution output, etc.). Once it reaches ~200k tokens (the context window limit for models like Claude claude-4-opus), the API returns an input-too-long error and the conversation becomes unusable.

The only current workaround is to restart btw_app() and rely on btw.md to restore context, which breaks flow and loses nuanced conversation state (e.g., what approaches were already tried, intermediate debugging insights, implicit design decisions not captured in btw.md).

Proposed Solution

Implement automatic context condensing (aka "auto-compact"), similar to what Claude Code and Cursor provide. The core idea:

  1. Monitor context size: Track the current input token count (btw_app already displays this).
  2. Trigger condensing at a threshold: When input tokens exceed a configurable percentage of the model's context window (e.g., 80%), automatically trigger a condensing step.
  3. Summarize conversation history: Use the LLM itself to produce a structured summary of the conversation so far, preserving:
    • Current task and objectives
    • Files that have been created or modified (and key changes)
    • Design decisions made and their rationale
    • What has been tried and what failed
    • Pending next steps
  4. Replace old turns: Use Chat$set_turns() to replace the full history with the condensed summary, freeing up token space for continued work.
  5. Continue seamlessly: The user continues the conversation without interruption.

Design Considerations

Where should this live?

This could be implemented at either the ellmer or btw level:

  • ellmer level (Chat class): A general-purpose $condense() method or an auto_condense option would benefit all ellmer-based applications. This is arguably the right abstraction layer since context window management is a fundamental chat concern.
  • btw level (btw_app() / btw_client()): btw could implement this as a wrapper, using domain-specific summarization prompts optimized for coding conversations (files changed, tests passing/failing, etc.).

Both layers could work together — ellmer provides the mechanism, btw provides the coding-aware condensing prompt.

Configurable options

# In btw.md
condense:
  enabled: true
  threshold: 0.8          # Trigger at 80% of context window
  strategy: "summarize"   # or "truncate" for simpler approach

Or as R options:

options(
  btw.condense = TRUE,
  btw.condense_threshold = 0.8
)

Manual trigger

In addition to auto-triggering, a manual /compact command (or a button in btw_app()) would be valuable for users who want explicit control.

What to preserve in the summary

For coding-focused conversations, the condensed summary should prioritize:

  • File state: Which files were created/modified and what the key changes were
  • Task progress: What's done, what's in progress, what's remaining
  • Failed approaches: What was tried and didn't work (to avoid repeating mistakes)
  • Design decisions: Architectural choices and their rationale
  • Current working context: Which file/function the user was last working on

Prior Art

  • Claude Code: Implements auto-compact that triggers automatically when context grows large. Users can also type /compact to trigger manually.
  • Cursor: Has "context condensation" that summarizes long conversations.
  • Aider: Implements "chat summarization" to manage long coding sessions.
  • Continue.dev: Provides conversation truncation strategies.

These tools demonstrate that context condensing is essentially a required feature for any coding agent used in long iterative sessions.

Relation to Existing Issues

  • feat: btw memory #66 (btw memory): Memory provides persistent cross-session facts, while condensing manages within-session conversation length. They are complementary — memory stores stable project knowledge, condensing manages ephemeral conversation history.
  • a more agentic sibling to btw_app() #148 (agentic btw_app sibling): A more agentic workflow will likely involve even longer sessions with more tool calls, making context condensing even more critical.

My Use Case

I use btw_app() with Claude claude-4-opus for iterative R package development — writing functions, tests, documentation, then debugging and refining in a continuous loop. A typical productive session easily generates 200k+ tokens of context. Currently, I'm forced to restart every ~30-60 minutes, which significantly disrupts the development flow.

Context condensing would allow me to maintain continuous multi-hour development sessions without manual intervention.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions