kv : add dynamic KV cache resize (--kv-dynamic) by rockyRunnr · Pull Request #21757 · ggml-org/llama.cpp

rockyRunnr · 2026-04-11T05:48:28Z

Add --kv-dynamic flag that starts with a small KV cache (256 cells) and grows on demand via try_resize(). Supports both standalone llama_kv_cache and hybrid (llama_memory_hybrid) architectures.

Growth strategy: doubling for small caches, +1GB linear for large.

Overview

When a large -c is set, llama.cpp allocates the full KV cache upfront. On
Apple Silicon / unified memory this can cause GPU OOM even when actual usage
is small.

--kv-dynamic starts the cache at 256 cells and grows as needed:

prepare() fail → try_resize() → retry in the same init_batch() call
resize: create new cache → copy per-layer/per-stream → swap internals
after resize, scheduler reserve is re-triggered in the same decode call
Grow-only for now. Shrink is out of scope for this PR.
Related to Feature Request: resize an existing context #11577.

Additional information

This is a draft — looking for feedback on direction before iterating further:

Is grow-only as a first step reasonable?
Is the create/copy/swap pattern acceptable?
The current growth heuristics are experimental and based on local testing rather than broad benchmarking.
Should growth heuristics be configurable rather than hardcoded?

Earlier experiments on Apple M4 (32 GB, Qwen3.5-27B-Q4_K_M, -c 131072):

prompt tokens	vanilla KV	dynamic KV
~100	8 GB	16 MB
~6K	8 GB	512 MB
~80K	8 GB	5 GB

In earlier local runs, the vanilla path frequently OOMed while the dynamic path completed without OOM.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: YES — used for debugging assistance and code review. PR text written by me.

Add --kv-dynamic flag that starts with a small KV cache (256 cells) and grows on demand via try_resize(). Supports both standalone llama_kv_cache and hybrid (llama_memory_hybrid) architectures. Growth strategy: doubling for small caches, +1GB linear for large.

ggml-gh-bot · 2026-04-11T05:52:06Z

Hi @rockyRunnr, thanks for your contribution!

Per our contribution guidelines, the automated PR checker found the following issue(s) that need your attention:

AI-generated content: This project does not accept PRs, descriptions or commit messages that are fully or predominantly AI-generated. If you have used AI to assist you in writing code, please make sure to disclose that explicitly.

Please note that maintainers reserve the right to make final decisions on PRs. If you believe there is a mistake, please comment below.

jacekpoplawski · 2026-04-11T13:32:17Z

Am I right that this mostly defers OOM rather than eliminating it? If so, what is the advantage over simply limiting the context size at startup?

rockyRunnr · 2026-04-11T13:49:13Z

@jacekpoplawski, Thank you for your comment.

Limiting the context size at startup forces the user to guess their eventual usage in advance. In many real sessions, the context starts small and only later grows longer. --kv-dynamic keeps the large upper bound available without paying the full KV cost upfront.

mvatafu · 2026-04-11T20:50:58Z

Wondering how would this complement the --parallel flag, since that one is splitting the -c into x pieces. It would mean we would be able to have more parallel small sessions ? That would be really nice.

rockyRunnr requested review from a team, CISC and ggerganov as code owners April 11, 2026 05:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kv : add dynamic KV cache resize (--kv-dynamic)#21757

kv : add dynamic KV cache resize (--kv-dynamic)#21757
rockyRunnr wants to merge 1 commit intoggml-org:masterfrom
rockyRunnr:feature/dynamic-kv-cache

rockyRunnr commented Apr 11, 2026

Uh oh!

ggml-gh-bot bot commented Apr 11, 2026

Uh oh!

jacekpoplawski commented Apr 11, 2026

Uh oh!

rockyRunnr commented Apr 11, 2026

Uh oh!

mvatafu commented Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

rockyRunnr commented Apr 11, 2026

Overview

Additional information

Requirements

Uh oh!

ggml-gh-bot bot commented Apr 11, 2026

Uh oh!

jacekpoplawski commented Apr 11, 2026

Uh oh!

rockyRunnr commented Apr 11, 2026

Uh oh!

mvatafu commented Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants