Feature Request: Add UI Toggle to Auto‑Remove Chain‑of‑Thought and Pre‑Submit Conversation History for Faster Next‑Turn Encoding

### Prerequisites

- [x] I am running the latest code. Mention the version if possible as well.
- [x] I carefully followed the [README.md](https://github.com/ggml-org/llama.cpp/blob/master/README.md).
- [x] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- [x] I reviewed the [Discussions](https://github.com/ggml-org/llama.cpp/discussions), and have a new and useful enhancement to share.

### Feature Description

Functional Requirement:

Please add a toggle option to the web UI. When this option is enabled, after the large language model (LLM) finishes responding to a user request, the web interface should automatically delete any “thinking” (chain‑of‑thought) content if the model supports such reasoning, and then submit the full conversation history back to the LLM (so the model has the entire context pre‑encoded).

Consequences of this behavior:

    When the user sends the next query, the LLM does not need to spend time re‑encoding the deleted reasoning steps or the previous generation; it can directly encode the new user prompt.
    This allows the system to pre‑process the context needed for the upcoming turn while the user is reading or thinking about the current response, resulting in a smoother, more responsive interaction.


### Motivation

When I receive a very large context from the model (e.g., a code snippet of more than 8 000 tokens) and then submit a simple follow‑up question, I have to wait a long time for the prompt‑processing step. While I am reading the returned context, the system is idle. The current LLM backend supports caching of multiple statements, so performing this pre‑processing can significantly improve the smoothness of subsequent turns in the conversation.

### Possible Implementation

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Add UI Toggle to Auto‑Remove Chain‑of‑Thought and Pre‑Submit Conversation History for Faster Next‑Turn Encoding #18853

Prerequisites

Feature Description

Motivation

Possible Implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: Add UI Toggle to Auto‑Remove Chain‑of‑Thought and Pre‑Submit Conversation History for Faster Next‑Turn Encoding #18853

Description

Prerequisites

Feature Description

Motivation

Possible Implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions