Eval bug: pre-allocated tensor (cache_k_l3 (view)) in a buffer that cannot run the operation (SET_ROWS)

### Name and Version

```
root@ed00659cbc64:/app# ./llama-cli --version
load_backend: loaded SYCL backend from /app/libggml-sycl.so
load_backend: loaded CPU backend from /app/libggml-cpu-haswell.so
version: 8793 (bc05a6803)
built with IntelLLVM 2025.3.2 for Linux x86_64
```

```
root@676ea1faa4ff:/app# ./llama-cli --version
load_backend: loaded Vulkan backend from /app/libggml-vulkan.so
load_backend: loaded CPU backend from /app/libggml-cpu-haswell.so
version: 8793 (bc05a6803)
built with GNU 15.2.0 for Linux x86_64
```

```
root@f39e7db7b25c:/app# ./llama-cli --version
load_backend: loaded Vulkan backend from /app/libggml-vulkan.so
load_backend: loaded CPU backend from /app/libggml-cpu-haswell.so
version: 8772 (e9c54d557)
built with GNU 15.2.0 for Linux x86_64
```

### Operating systems

Linux

### GGML backends

SYCL

### Hardware

Ryzen 7 5700X3D
B580

### Models

_No response_

### Problem description & steps to reproduce

```
root@f39e7db7b25c:/app# ./llama-bench --model /models/Qwen3.5-0.8B-F16.gguf --flash-attn 1 --cache-type-k turbo4 --cache-type-v turbo4
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Arc(tm) B580 Graphics (BMG G21) (Intel open-source Mesa driver) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: KHR_coopmat
load_backend: loaded Vulkan backend from /app/libggml-vulkan.so
load_backend: loaded CPU backend from /app/libggml-cpu-haswell.so
| model                          |       size |     params | backend    | ngl | type_k | type_v | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -----: | -----: | -: | --------------: | -------------------: |
/app/ggml/src/ggml-backend.cpp:809: pre-allocated tensor (cache_k_l3 (view)) in a buffer (Vulkan0) that cannot run the operation (SET_ROWS)
libggml-base.so.0(+0x18ab6) [0x7c69f00e3ab6]
libggml-base.so.0(ggml_print_backtrace+0x20d) [0x7c69f00e3f1d]
libggml-base.so.0(ggml_abort+0x166) [0x7c69f00e4106]
libggml-base.so.0(+0x31b7c) [0x7c69f00fcb7c]
libggml-base.so.0(ggml_backend_sched_split_graph+0xc8f) [0x7c69f00feb9f]
libllama.so.0(_ZN13llama_context13graph_reserveEjjjPK22llama_memory_context_ibPm+0x616) [0x7c69f0282a06]
libllama.so.0(_ZN13llama_context13sched_reserveEv+0x1000) [0x7c69f0284a40]
libllama.so.0(_ZN13llama_contextC2ERK11llama_model20llama_context_params+0xa75) [0x7c69f0286ac5]
libllama.so.0(llama_init_from_model+0x126) [0x7c69f02877c6]
./llama-bench(+0x3ab84) [0x56333c439b84]
/usr/lib/x86_64-linux-gnu/libc.so.6(+0x2a601) [0x7c69ef3b3601]
/usr/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x88) [0x7c69ef3b3718]
./llama-bench(+0x3d3d5) [0x56333c43c3d5]
Aborted                    (core dumped) ./llama-bench --model /models/Qwen3.5-0.8B-F16.gguf --flash-attn 1 --cache-type-k turbo4 --cache-type-v turbo4
```

Tried on both SYCL and Vulkan backends

### First Bad Commit

_No response_

### Relevant log output

Above


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Eval bug: pre-allocated tensor (cache_k_l3 (view)) in a buffer that cannot run the operation (SET_ROWS) #50

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Eval bug: pre-allocated tensor (cache_k_l3 (view)) in a buffer that cannot run the operation (SET_ROWS) #50

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions