Symmetric turbo3 KV catastrophic on Qwen2.5 (GQA 7:1) — auto-asymmetric fix proposed

## Problem

Symmetric turbo3 K+V produces catastrophic PPL on Qwen2.5-7B (PPL 2,887 vs baseline 7.43). This has **never worked** — bisect confirms PPL 10,829 at the very first CUDA turbo commit (97dddacf4).

## Root cause

Qwen2.5-7B has 4 KV heads serving 28 Q heads (GQA ratio 7:1). Each K head's turbo3 quantization error gets broadcast to 7 Q heads, amplifying the error. Models with GQA ≤ 4:1 are unaffected (Mistral-7B: 8 KV heads, turbo3 PPL = 7.71, only +4.4%).

## Bisect data

| Commit | Label | Qwen turbo3/turbo3 PPL |
|--------|-------|----------------------|
| 97dddacf4 | first VEC FA fix | 10,829 |
| f2b39368c | asymmetric KV support | 2,887 |
| 53180d9a0 | block-size 128 | 106,035 |
| fe15d6176 | InnerQ equalization | 2,886 |
| bc05a6803 | current HEAD | 2,887 |

Never worked. The asymmetric KV support (q8_0-K + turbo3-V) was introduced as a manual fix for exactly this issue.

## Proposed fix

Auto-detect high GQA ratio at KV cache init and silently upgrade K to q8_0:

```cpp
const uint32_t gqa_ratio = n_head / n_head_kv;
if (gqa_ratio >= 6 && type_k == type_v && k_is_turbo) {
    type_k = GGML_TYPE_Q8_0;  // auto-asymmetric
}
```

Results with fix:
- Qwen2.5-7B turbo3/turbo3: PPL **7.06** (was 2,887), NIAH **5/5**
- Mistral-7B: unaffected (GQA 4:1, below threshold)
- Opt-out: `TURBO_AUTO_ASYMMETRIC=0`

Implementation: https://github.com/signalnine/llama-cpp-turboquant/tree/fix/auto-asymmetric-gqa
Also included in PR #53.

## Affected models

Any model with n_head_kv ≤ 4 and n_head ≥ 24 (GQA ≥ 6:1):
- Qwen2.5 family (4 KV heads, 28 Q heads)
- Qwen2 family (same architecture)
- Potentially other models with aggressive GQA

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Symmetric turbo3 KV catastrophic on Qwen2.5 (GQA 7:1) — auto-asymmetric fix proposed #54

Problem

Root cause

Bisect data

Proposed fix

Affected models

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Commit	Label	Qwen turbo3/turbo3 PPL
`97dddac`	first VEC FA fix	10,829
`f2b3936`	asymmetric KV support	2,887
`53180d9`	block-size 128	106,035
`fe15d61`	InnerQ equalization	2,886
`bc05a68`	current HEAD	2,887

Uh oh!

Symmetric turbo3 KV catastrophic on Qwen2.5 (GQA 7:1) — auto-asymmetric fix proposed #54

Description

Problem

Root cause

Bisect data

Proposed fix

Affected models

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions