Eval bug: tensor parallelism failing with -nkvo (Gemma 31B)

### Name and Version

pwilkin@SYN-PC-11:/devel/models$ llama-cli --version
load_backend: loaded BLAS backend from /devel/tools/llama.cpp/build/bin/libggml-blas.so
ggml_cuda_init: found 2 CUDA devices (Total VRAM: 31679 MiB):
  Device 0: NVIDIA GeForce RTX 5070 Ti, compute capability 12.0, VMM: yes, VRAM: 15839 MiB
  Device 1: NVIDIA GeForce RTX 5070 Ti, compute capability 12.0, VMM: yes, VRAM: 15839 MiB
load_backend: loaded CUDA backend from /devel/tools/llama.cpp/build/bin/libggml-cuda.so
load_backend: loaded CPU backend from /devel/tools/llama.cpp/build/bin/libggml-cpu-alderlake.so
version: 8738 (d6f303004)
built with GNU 15.2.0 for Linux x86_64

pwilkin@SYN-PC-11:/devel/models$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Wed_Aug_20_01:58:59_PM_PDT_2025
Cuda compilation tools, release 13.0, V13.0.88
Build cuda_13.0.r13.0/compiler.36424714_0


### Operating systems

Linux

### GGML backends

CUDA

### Hardware

2x 5070 Ti 16 GB
CPU: Intel(R) Core(TM) i7-14700KF

### Models

Gemma 31B Q5_K_S (bartowski)

### Problem description & steps to reproduce

Running 

```console
llama-server -m google_gemma-4-31B-it-Q5_K_M.gguf -c 150000 -a syndatis --mmproj mmproj-google_gemma-4-31B-it-q8_0.gguf --chat-template-file /devel/tools/llama.cpp/models/templates/google-gemma-4-31B-it-interleaved.jinja --host 0.0.0.0 --cache-ram 4096 -ctxcp 4 -np 1 -sm tensor -nkvo
```

fails with an assertion error. The same model runs fine on smaller context without `-nkvo`.

### First Bad Commit

_No response_

### Relevant log output

<details>
<summary>Logs</summary>


```console
/devel/tools/llama.cpp/ggml/src/ggml-backend-meta.cpp:729: GGML_ASSERT(src_ss[1].axis == GGML_BACKEND_SPLIT_AXIS_2) failed
```
</details>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval bug: tensor parallelism failing with -nkvo (Gemma 31B) #21686

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Eval bug: tensor parallelism failing with -nkvo (Gemma 31B) #21686

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions