Skip to content

TP: fix arbitrary -ot#21717

Open
JohannesGaessler wants to merge 1 commit intoggml-org:masterfrom
JohannesGaessler:tp-fix-nkvo
Open

TP: fix arbitrary -ot#21717
JohannesGaessler wants to merge 1 commit intoggml-org:masterfrom
JohannesGaessler:tp-fix-nkvo

Conversation

@JohannesGaessler
Copy link
Copy Markdown
Contributor

Fixes #21686 .

The problem is that the meta backend always assumes a mirrored layout for the graph inputs created by the backend scheduler. However, with options like -nkvo this is not correct because in order to correctly align the KV cache with the split weights surrounding it the KV cache also needs to be split. The meta backend determines split states by going up the chain of ggml_tensor::src until it finds statically allocated weights that have a fixed split state. So the issue can be fixed by finding the original tensor that is being copied to the meta backend and propagating the split state from there (since this logic only depends on tensor ops and shapes). In the meta backend this requires

  1. a way to identify which tensors are inputs created by the backend scheduler, and
  2. a reference to the original tensor that is being copied.

This PR implements 1 by comparing the tensor name and 2 by setting ggml_tensor::src[0] of the graph input to the original tensor; since the graph input has GGML_OP_NONE this should be safe. I am not happy with this implementation though.

For the split states of weights I am currently comparing tensor names which is I think a hacky and bad way to do it. I've been thinking that it would be better to, when creating the statically allocated tensors for the model, create a map of ggml_tensor * -> enum llm_tensor and to use that to determine which tensors should receive which split state. But this would not work for the dynamically allocated tensors of the backend scheduler. One solution would be to set a flag for those tensors. But long-term I've been thinking whether it would maybe make sense to add something like a GGML_OP_BACKEND_COPY and to do the data copies as part of the ggml graph. That would also give the meta backend a natural way to handle this edge case.

Requirements

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Apr 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Eval bug: tensor parallelism failing with -nkvo (Gemma 31B)

1 participant