Skip to content

Feature Request: Add UI Toggle to Auto‑Remove Chain‑of‑Thought and Pre‑Submit Conversation History for Faster Next‑Turn Encoding #18853

@zts9989

Description

@zts9989

Prerequisites

  • I am running the latest code. Mention the version if possible as well.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

Functional Requirement:

Please add a toggle option to the web UI. When this option is enabled, after the large language model (LLM) finishes responding to a user request, the web interface should automatically delete any “thinking” (chain‑of‑thought) content if the model supports such reasoning, and then submit the full conversation history back to the LLM (so the model has the entire context pre‑encoded).

Consequences of this behavior:

When the user sends the next query, the LLM does not need to spend time re‑encoding the deleted reasoning steps or the previous generation; it can directly encode the new user prompt.
This allows the system to pre‑process the context needed for the upcoming turn while the user is reading or thinking about the current response, resulting in a smoother, more responsive interaction.

Motivation

When I receive a very large context from the model (e.g., a code snippet of more than 8 000 tokens) and then submit a simple follow‑up question, I have to wait a long time for the prompt‑processing step. While I am reading the returned context, the system is idle. The current LLM backend supports caching of multiple statements, so performing this pre‑processing can significantly improve the smoothness of subsequent turns in the conversation.

Possible Implementation

No response

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions