UPSTREAM PR #21452: metal : add GATED_LINEAR_ATTN op#1334
UPSTREAM PR #21452: metal : add GATED_LINEAR_ATTN op#1334
Conversation
|
No meaningful performance changes were detected across 125473 analyzed functions in the following binaries: build.bin.llama-tts, build.bin.llama-cvector-generator, build.bin.libllama.so, build.bin.libmtmd.so, build.bin.llama-bench, build.bin.libggml-cpu.so, build.bin.libggml-base.so, build.bin.libggml.so, build.bin.llama-tokenize, build.bin.llama-gemma3-cli, build.bin.llama-gguf-split, build.bin.llama-llava-cli, build.bin.llama-minicpmv-cli, build.bin.llama-quantize, build.bin.llama-qwen2vl-cli. 💬 Questions? Tag @loci-dev |
55afbee to
ef0eff4
Compare
63ab8d1 to
7638ab4
Compare
Add Metal backend support for GGML_OP_GATED_LINEAR_ATTN (GLA). Supports head_size 64 and 128, f32 only. Tested with test-backend-ops (7/7 passed): - M5 Max 128GB (Apple10) - M2 Mac Mini (Apple8)
ffce212 to
6244339
Compare
|
No meaningful performance changes were detected across 46867 analyzed functions in the following binaries: build.bin.libllama.so, build.bin.libmtmd.so, build.bin.libggml-cpu.so, build.bin.libggml.so, build.bin.libggml-base.so, build.bin.llama-tts, build.bin.llama-bench, build.bin.llama-cvector-generator, build.bin.llama-gemma3-cli, build.bin.llama-gguf-split, build.bin.llama-llava-cli, build.bin.llama-minicpmv-cli, build.bin.llama-quantize, build.bin.llama-qwen2vl-cli, build.bin.llama-tokenize. 💬 Questions? Tag @loci-dev |
Note
Source pull request: ggml-org/llama.cpp#21452
Add Metal backend support for GGML_OP_GATED_LINEAR_ATTN (GLA). Supports head_size 64 and 128, f32 only.
Tested with test-backend-ops (7/7 passed):
Overview
Adds Metal backend support for
GGML_OP_GATED_LINEAR_ATTN, which currently falls back to CPU on Apple devices.The kernel follows the RWKV WKV6 Metal structure: one thread per head element, threadgroup shared memory for k/q/gate, and a float4-vectorized inner loop.
supports_oprestricts execution to F32 and head sizes 64 or 128.No dedicated performance benchmarks are currently configured for this op.
Additional information
docs/ops.mdanddocs/ops/Metal.csvwere regenerated using the required workflow (test-backend-ops support --output csvand./scripts/create_ops_docs.py). The resulting diff includes updates to multiple ops due to recent changes in main; the only functional change in this PR is support forGATED_LINEAR_ATTN.Mentions #14909
Requirements