Problem
DeepSeek context caching depends on stable reusable prompt prefixes, but the TUI previously made it hard to see why cache hits were low.
In long sessions, cache reuse can be reduced by:
- static prompt prefix changes that are hard to detect
- growing conversation history
- large tool result messages
- repeated identical tool outputs
- repeated
<turn_meta> blocks
Proposal
Add minimal DeepSeek cache-aware diagnostics and wire-only payload optimization.
Scope
- Parse and display DeepSeek cache usage fields:
prompt_cache_hit_tokens
prompt_cache_miss_tokens
prompt_tokens
completion_tokens
total_tokens
- Add
/cache inspect to show rendered prompt structure without printing full prompt text:
- Base static prefix hash
- Full request prefix hash
- static/history/dynamic layer classification
- first divergence from previous request
- SHA-256 hash and char length per layer
- Keep stable prompt content before dynamic user input where possible.
- Add stable Project Context Pack support before user input.
- Add
/cache warmup using the same stable prefix construction as normal requests.
- Optimize rendered wire messages only:
- truncate oversized tool results before sending to DeepSeek
- deduplicate repeated identical tool results with stable refs
- deduplicate repeated
<turn_meta> blocks with stable refs
Non-goals
- Do not rewrite TUI architecture
- Do not change existing config format
- Do not remove full UI transcript output
- Do not modify original session messages
- Do not print full prompts, API keys, or sensitive environment values
- Do not guarantee 100% cache hits
Acceptance Criteria
/cache inspect can verify whether the base static prefix is stable.
- Cache hit/miss metrics are shown when DeepSeek returns them.
- Missing cache fields are handled gracefully.
- Large/repeated tool outputs are reduced only in rendered API messages.
- Repeated
<turn_meta> blocks are reduced only in rendered API messages.
- UI transcript and saved session history remain unchanged.
- Tests cover prompt layer hashes, tool result budget/dedup, and
<turn_meta> dedup.
Related
Problem
DeepSeek context caching depends on stable reusable prompt prefixes, but the TUI previously made it hard to see why cache hits were low.
In long sessions, cache reuse can be reduced by:
<turn_meta>blocks
Proposal
Add minimal DeepSeek cache-aware diagnostics and wire-only payload optimization.
Scope
prompt_cache_hit_tokensprompt_cache_miss_tokensprompt_tokenscompletion_tokenstotal_tokens
/cache inspectto show rendered prompt structure without printing full prompt text:
/cache warmupusing the same stable prefix construction as normal requests.
<turn_meta>blocks with stable refs
Non-goals
Acceptance Criteria
/cache inspectcan verify whether the base static prefix is stable.<turn_meta>blocks are reduced only in rendered API messages.<turn_meta>dedup.
Related