You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using btw_app() for extended iterative development sessions (e.g., writing code, running tests, fixing bugs in a loop), the input token count steadily grows — primarily due to accumulated tool call inputs/outputs (file reads, code search results, R execution output, etc.). Once it reaches ~200k tokens (the context window limit for models like Claude claude-4-opus), the API returns an input-too-long error and the conversation becomes unusable.
The only current workaround is to restart btw_app() and rely on btw.md to restore context, which breaks flow and loses nuanced conversation state (e.g., what approaches were already tried, intermediate debugging insights, implicit design decisions not captured in btw.md).
Proposed Solution
Implement automatic context condensing (aka "auto-compact"), similar to what Claude Code and Cursor provide. The core idea:
Monitor context size: Track the current input token count (btw_app already displays this).
Trigger condensing at a threshold: When input tokens exceed a configurable percentage of the model's context window (e.g., 80%), automatically trigger a condensing step.
Summarize conversation history: Use the LLM itself to produce a structured summary of the conversation so far, preserving:
Current task and objectives
Files that have been created or modified (and key changes)
Design decisions made and their rationale
What has been tried and what failed
Pending next steps
Replace old turns: Use Chat$set_turns() to replace the full history with the condensed summary, freeing up token space for continued work.
Continue seamlessly: The user continues the conversation without interruption.
Design Considerations
Where should this live?
This could be implemented at either the ellmer or btw level:
ellmer level (Chat class): A general-purpose $condense() method or an auto_condense option would benefit all ellmer-based applications. This is arguably the right abstraction layer since context window management is a fundamental chat concern.
btw level (btw_app() / btw_client()): btw could implement this as a wrapper, using domain-specific summarization prompts optimized for coding conversations (files changed, tests passing/failing, etc.).
Both layers could work together — ellmer provides the mechanism, btw provides the coding-aware condensing prompt.
Configurable options
# In btw.mdcondense:
enabled: truethreshold: 0.8# Trigger at 80% of context windowstrategy: "summarize"# or "truncate" for simpler approach
These tools demonstrate that context condensing is essentially a required feature for any coding agent used in long iterative sessions.
Relation to Existing Issues
feat: btw memory #66 (btw memory): Memory provides persistent cross-session facts, while condensing manages within-session conversation length. They are complementary — memory stores stable project knowledge, condensing manages ephemeral conversation history.
a more agentic sibling to btw_app() #148 (agentic btw_app sibling): A more agentic workflow will likely involve even longer sessions with more tool calls, making context condensing even more critical.
My Use Case
I use btw_app() with Claude claude-4-opus for iterative R package development — writing functions, tests, documentation, then debugging and refining in a continuous loop. A typical productive session easily generates 200k+ tokens of context. Currently, I'm forced to restart every ~30-60 minutes, which significantly disrupts the development flow.
Context condensing would allow me to maintain continuous multi-hour development sessions without manual intervention.
Problem
When using
btw_app()for extended iterative development sessions (e.g., writing code, running tests, fixing bugs in a loop), the input token count steadily grows — primarily due to accumulated tool call inputs/outputs (file reads, code search results, R execution output, etc.). Once it reaches ~200k tokens (the context window limit for models like Claude claude-4-opus), the API returns an input-too-long error and the conversation becomes unusable.The only current workaround is to restart
btw_app()and rely onbtw.mdto restore context, which breaks flow and loses nuanced conversation state (e.g., what approaches were already tried, intermediate debugging insights, implicit design decisions not captured inbtw.md).Proposed Solution
Implement automatic context condensing (aka "auto-compact"), similar to what Claude Code and Cursor provide. The core idea:
Chat$set_turns()to replace the full history with the condensed summary, freeing up token space for continued work.Design Considerations
Where should this live?
This could be implemented at either the ellmer or btw level:
Chatclass): A general-purpose$condense()method or anauto_condenseoption would benefit all ellmer-based applications. This is arguably the right abstraction layer since context window management is a fundamental chat concern.btw_app()/btw_client()): btw could implement this as a wrapper, using domain-specific summarization prompts optimized for coding conversations (files changed, tests passing/failing, etc.).Both layers could work together — ellmer provides the mechanism, btw provides the coding-aware condensing prompt.
Configurable options
Or as R options:
Manual trigger
In addition to auto-triggering, a manual
/compactcommand (or a button inbtw_app()) would be valuable for users who want explicit control.What to preserve in the summary
For coding-focused conversations, the condensed summary should prioritize:
Prior Art
/compactto trigger manually.These tools demonstrate that context condensing is essentially a required feature for any coding agent used in long iterative sessions.
Relation to Existing Issues
btw_app()#148 (agentic btw_app sibling): A more agentic workflow will likely involve even longer sessions with more tool calls, making context condensing even more critical.My Use Case
I use
btw_app()with Claude claude-4-opus for iterative R package development — writing functions, tests, documentation, then debugging and refining in a continuous loop. A typical productive session easily generates 200k+ tokens of context. Currently, I'm forced to restart every ~30-60 minutes, which significantly disrupts the development flow.Context condensing would allow me to maintain continuous multi-hour development sessions without manual intervention.