model : refactor QKV into common build_qkv and create_tensor_qkv helpers#21245
Open
JoursBleu wants to merge 2 commits intoggml-org:masterfrom
Open
model : refactor QKV into common build_qkv and create_tensor_qkv helpers#21245JoursBleu wants to merge 2 commits intoggml-org:masterfrom
JoursBleu wants to merge 2 commits intoggml-org:masterfrom
Conversation
da129d5 to
26e72e0
Compare
CISC
reviewed
Apr 1, 2026
26e72e0 to
bcc69fd
Compare
Contributor
Author
|
hi @CISC,
|
CISC
reviewed
Apr 1, 2026
bcc69fd to
42eae08
Compare
Contributor
Author
|
@CISC Done:
|
CISC
reviewed
Apr 1, 2026
42eae08 to
75d759d
Compare
Contributor
Author
|
@CISC Done:
|
CISC
reviewed
Apr 2, 2026
Member
CISC
left a comment
There was a problem hiding this comment.
OP is inaccurate, there's nothing special about these:
nemotron-h: just addbuild_qkvinllm_build_nemotron_h::build_attention_layergranite-hybrid: just addbuild_qkvinlm_build_granite_hybrid::build_attention_layerolmo/mpt/dbrx: usebuild_qkv, add clampinggemma3n-iswa: just dobuild_qkvt5-dec/t5-enc: dobuild_qkvon normal self-attentionbert: usebuild_qkvlfm2: dobuild_qkvinbuild_attn_block
Member
|
I meant move the clamping to |
09d8066 to
04506d4
Compare
CISC
reviewed
Apr 6, 2026
050b5a9 to
623ed29
Compare
Contributor
Author
|
@CISC Done:
|
CISC
reviewed
Apr 10, 2026
623ed29 to
ccd1f60
Compare
CISC
reviewed
Apr 10, 2026
ccd1f60 to
67a8492
Compare
CISC
approved these changes
Apr 11, 2026
CISC
reviewed
Apr 11, 2026
…e-hybrid/gemma3n-iswa/t5-dec and fix wqkv_s
67a8492 to
d8bf733
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
Currently llama.cpp supports 112 model files in
src/models/.We modified the 87 applicable model files. Our changes abstract the duplicated
Q/K/V tensors' loading and graph-building code into two reusable helpers,
following the
create_tensor_gate_up_expspattern (#19139).•
create_tensor_qkv(llama-model.cpp): tries fusedwqkv/bqkvfirst (TENSOR_NOT_REQUIRED | TENSOR_SKIP_IF_VIRTUAL), falls back to separatewq/wk/wv. Supports adding biases.•
build_qkv(llama-graph.h/cpp): returns{Qcur, Kcur, Vcur}as 3D tensors. Fused case: single fused qkv matmul +ggml_view_3dsplit. Separate case: 3 separate matmuls +ggml_reshape_3d.Test:
test-llama-archs— all OK, 0 FAIL. Zero diff onllama-arch.cpp.The remaining 25 models are not modified for the following reasons:
Additional information
Basing on the discussion in #20628 (@am17an, @ngxson).
The plan is:
the two functions above, and adds handling for the fused qkv case.
--fuse-qkvtoconvert_hf_to_gguf.py.Requirements