TP: fix arbitrary -ot by JohannesGaessler · Pull Request #21717 · ggml-org/llama.cpp

JohannesGaessler · 2026-04-10T09:23:29Z

The problem is that the meta backend always assumes a mirrored layout for the graph inputs created by the backend scheduler. However, with options like -nkvo this is not correct because in order to correctly align the KV cache with the split weights surrounding it the KV cache also needs to be split. The meta backend determines split states by going up the chain of ggml_tensor::src until it finds statically allocated weights that have a fixed split state. So the issue can be fixed by finding the original tensor that is being copied to the meta backend and propagating the split state from there (since this logic only depends on tensor ops and shapes). In the meta backend this requires

a way to identify which tensors are inputs created by the backend scheduler, and
a reference to the original tensor that is being copied.

This PR implements 1 by comparing the tensor name and 2 by setting ggml_tensor::src[0] of the graph input to the original tensor; since the graph input has GGML_OP_NONE this should be safe. I am not happy with this implementation though.

For the split states of weights I am currently comparing tensor names which is I think a hacky and bad way to do it. I've been thinking that it would be better to, when creating the statically allocated tensors for the model, create a map of ggml_tensor * -> enum llm_tensor and to use that to determine which tensors should receive which split state. But this would not work for the dynamically allocated tensors of the backend scheduler. One solution would be to set a flag for those tensors. But long-term I've been thinking whether it would maybe make sense to add something like a GGML_OP_BACKEND_COPY and to do the data copies as part of the ggml graph. That would also give the meta backend a natural way to handle this edge case.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: No

TP: fix arbitrary -ot

504970d

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Apr 10, 2026

JohannesGaessler mentioned this pull request Apr 10, 2026

Eval bug: tensor parallelism failing with -nkvo (Gemma 31B) #21686

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TP: fix arbitrary -ot#21717

TP: fix arbitrary -ot#21717
JohannesGaessler wants to merge 1 commit intoggml-org:masterfrom
JohannesGaessler:tp-fix-nkvo

JohannesGaessler commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JohannesGaessler commented Apr 10, 2026

Requirements

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant