Name and Version
root@ed00659cbc64:/app# ./llama-cli --version
load_backend: loaded SYCL backend from /app/libggml-sycl.so
load_backend: loaded CPU backend from /app/libggml-cpu-haswell.so
version: 8793 (bc05a6803)
built with IntelLLVM 2025.3.2 for Linux x86_64
root@676ea1faa4ff:/app# ./llama-cli --version
load_backend: loaded Vulkan backend from /app/libggml-vulkan.so
load_backend: loaded CPU backend from /app/libggml-cpu-haswell.so
version: 8793 (bc05a6803)
built with GNU 15.2.0 for Linux x86_64
root@f39e7db7b25c:/app# ./llama-cli --version
load_backend: loaded Vulkan backend from /app/libggml-vulkan.so
load_backend: loaded CPU backend from /app/libggml-cpu-haswell.so
version: 8772 (e9c54d557)
built with GNU 15.2.0 for Linux x86_64
Operating systems
Linux
GGML backends
SYCL
Hardware
Ryzen 7 5700X3D
B580
Models
No response
Problem description & steps to reproduce
root@f39e7db7b25c:/app# ./llama-bench --model /models/Qwen3.5-0.8B-F16.gguf --flash-attn 1 --cache-type-k turbo4 --cache-type-v turbo4
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Arc(tm) B580 Graphics (BMG G21) (Intel open-source Mesa driver) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: KHR_coopmat
load_backend: loaded Vulkan backend from /app/libggml-vulkan.so
load_backend: loaded CPU backend from /app/libggml-cpu-haswell.so
| model | size | params | backend | ngl | type_k | type_v | fa | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -----: | -----: | -: | --------------: | -------------------: |
/app/ggml/src/ggml-backend.cpp:809: pre-allocated tensor (cache_k_l3 (view)) in a buffer (Vulkan0) that cannot run the operation (SET_ROWS)
libggml-base.so.0(+0x18ab6) [0x7c69f00e3ab6]
libggml-base.so.0(ggml_print_backtrace+0x20d) [0x7c69f00e3f1d]
libggml-base.so.0(ggml_abort+0x166) [0x7c69f00e4106]
libggml-base.so.0(+0x31b7c) [0x7c69f00fcb7c]
libggml-base.so.0(ggml_backend_sched_split_graph+0xc8f) [0x7c69f00feb9f]
libllama.so.0(_ZN13llama_context13graph_reserveEjjjPK22llama_memory_context_ibPm+0x616) [0x7c69f0282a06]
libllama.so.0(_ZN13llama_context13sched_reserveEv+0x1000) [0x7c69f0284a40]
libllama.so.0(_ZN13llama_contextC2ERK11llama_model20llama_context_params+0xa75) [0x7c69f0286ac5]
libllama.so.0(llama_init_from_model+0x126) [0x7c69f02877c6]
./llama-bench(+0x3ab84) [0x56333c439b84]
/usr/lib/x86_64-linux-gnu/libc.so.6(+0x2a601) [0x7c69ef3b3601]
/usr/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x88) [0x7c69ef3b3718]
./llama-bench(+0x3d3d5) [0x56333c43c3d5]
Aborted (core dumped) ./llama-bench --model /models/Qwen3.5-0.8B-F16.gguf --flash-attn 1 --cache-type-k turbo4 --cache-type-v turbo4
Tried on both SYCL and Vulkan backends
First Bad Commit
No response
Relevant log output
Above
Name and Version
Operating systems
Linux
GGML backends
SYCL
Hardware
Ryzen 7 5700X3D
B580
Models
No response
Problem description & steps to reproduce
Tried on both SYCL and Vulkan backends
First Bad Commit
No response
Relevant log output
Above