Feature Request: Auto-compact / Context Condensing for Long Conversations

## Problem

When using `btw_app()` for extended iterative development sessions (e.g., writing code, running tests, fixing bugs in a loop), the input token count steadily grows — primarily due to accumulated tool call inputs/outputs (file reads, code search results, R execution output, etc.). Once it reaches ~200k tokens (the context window limit for models like Claude claude-4-opus), the API returns an input-too-long error and the conversation becomes unusable.

The only current workaround is to restart `btw_app()` and rely on `btw.md` to restore context, which breaks flow and loses nuanced conversation state (e.g., what approaches were already tried, intermediate debugging insights, implicit design decisions not captured in `btw.md`).

## Proposed Solution

Implement automatic context condensing (aka "auto-compact"), similar to what Claude Code and Cursor provide. The core idea:

1. **Monitor context size**: Track the current input token count (btw_app already displays this).
2. **Trigger condensing at a threshold**: When input tokens exceed a configurable percentage of the model's context window (e.g., 80%), automatically trigger a condensing step.
3. **Summarize conversation history**: Use the LLM itself to produce a structured summary of the conversation so far, preserving:
   - Current task and objectives
   - Files that have been created or modified (and key changes)
   - Design decisions made and their rationale
   - What has been tried and what failed
   - Pending next steps
4. **Replace old turns**: Use `Chat$set_turns()` to replace the full history with the condensed summary, freeing up token space for continued work.
5. **Continue seamlessly**: The user continues the conversation without interruption.

## Design Considerations

### Where should this live?

This could be implemented at either the **ellmer** or **btw** level:

- **ellmer level** (`Chat` class): A general-purpose `$condense()` method or an `auto_condense` option would benefit all ellmer-based applications. This is arguably the right abstraction layer since context window management is a fundamental chat concern.
- **btw level** (`btw_app()` / `btw_client()`): btw could implement this as a wrapper, using domain-specific summarization prompts optimized for coding conversations (files changed, tests passing/failing, etc.).

Both layers could work together — ellmer provides the mechanism, btw provides the coding-aware condensing prompt.

### Configurable options

```yaml
# In btw.md
condense:
  enabled: true
  threshold: 0.8          # Trigger at 80% of context window
  strategy: "summarize"   # or "truncate" for simpler approach
```

Or as R options:

```r
options(
  btw.condense = TRUE,
  btw.condense_threshold = 0.8
)
```

### Manual trigger

In addition to auto-triggering, a manual `/compact` command (or a button in `btw_app()`) would be valuable for users who want explicit control.

### What to preserve in the summary

For coding-focused conversations, the condensed summary should prioritize:

- **File state**: Which files were created/modified and what the key changes were
- **Task progress**: What's done, what's in progress, what's remaining
- **Failed approaches**: What was tried and didn't work (to avoid repeating mistakes)
- **Design decisions**: Architectural choices and their rationale
- **Current working context**: Which file/function the user was last working on

## Prior Art

- **Claude Code**: Implements auto-compact that triggers automatically when context grows large. Users can also type `/compact` to trigger manually.
- **Cursor**: Has "context condensation" that summarizes long conversations.
- **Aider**: Implements "chat summarization" to manage long coding sessions.
- **Continue.dev**: Provides conversation truncation strategies.

These tools demonstrate that context condensing is essentially a required feature for any coding agent used in long iterative sessions.

## Relation to Existing Issues

- **#66 (btw memory)**: Memory provides persistent cross-session facts, while condensing manages within-session conversation length. They are complementary — memory stores stable project knowledge, condensing manages ephemeral conversation history.
- **#148 (agentic btw_app sibling)**: A more agentic workflow will likely involve even longer sessions with more tool calls, making context condensing even more critical.

## My Use Case

I use `btw_app()` with Claude claude-4-opus for iterative R package development — writing functions, tests, documentation, then debugging and refining in a continuous loop. A typical productive session easily generates 200k+ tokens of context. Currently, I'm forced to restart every ~30-60 minutes, which significantly disrupts the development flow.

Context condensing would allow me to maintain continuous multi-hour development sessions without manual intervention.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Auto-compact / Context Condensing for Long Conversations #173

Problem

Proposed Solution

Design Considerations

Where should this live?

Configurable options

Manual trigger

What to preserve in the summary

Prior Art

Relation to Existing Issues

My Use Case

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Feature Request: Auto-compact / Context Condensing for Long Conversations #173

Description

Problem

Proposed Solution

Design Considerations

Where should this live?

Configurable options

Manual trigger

What to preserve in the summary

Prior Art

Relation to Existing Issues

My Use Case

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions