Skip to content

Feature: DeepSeek cache-aware prompt diagnostics and wire payload optimization #1253

@wplll

Description

@wplll

Problem


DeepSeek context caching depends on stable reusable prompt prefixes, but the TUI previously made it hard to see why cache hits were low.

In long sessions, cache reuse can be reduced by:


  • static prompt prefix changes that are hard to detect
  • growing conversation history
  • large tool result messages
  • repeated identical tool outputs
  • repeated <turn_meta> blocks
    

Proposal


Add minimal DeepSeek cache-aware diagnostics and wire-only payload optimization.



Scope



  • Parse and display DeepSeek cache usage fields:
  • prompt_cache_hit_tokens
  • prompt_cache_miss_tokens
  • prompt_tokens
  • completion_tokens
  • total_tokens
    
  • Add /cache inspect to show rendered prompt structure without printing full prompt text:
  • Base static prefix hash
  • Full request prefix hash
  • static/history/dynamic layer classification
  • first divergence from previous request
  • SHA-256 hash and char length per layer
    
  • Keep stable prompt content before dynamic user input where possible.
    
  • Add stable Project Context Pack support before user input.
    
  • Add /cache warmup using the same stable prefix construction as normal requests.
    
  • Optimize rendered wire messages only:
  • truncate oversized tool results before sending to DeepSeek
  • deduplicate repeated identical tool results with stable refs
  • deduplicate repeated <turn_meta> blocks with stable refs
    

Non-goals



  • Do not rewrite TUI architecture
  • Do not change existing config format
  • Do not remove full UI transcript output
  • Do not modify original session messages
  • Do not print full prompts, API keys, or sensitive environment values
  • Do not guarantee 100% cache hits
    

Acceptance Criteria



  • /cache inspect can verify whether the base static prefix is stable.
  • Cache hit/miss metrics are shown when DeepSeek returns them.
  • Missing cache fields are handled gracefully.
  • Large/repeated tool outputs are reduced only in rendered API messages.
  • Repeated <turn_meta> blocks are reduced only in rendered API messages.
  • UI transcript and saved session history remain unchanged.
  • Tests cover prompt layer hashes, tool result budget/dedup, and <turn_meta> dedup.
    

Related



Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions