Skip to content

Eval bug: pre-allocated tensor (cache_k_l3 (view)) in a buffer that cannot run the operation (SET_ROWS) #50

@WizardlyBump17

Description

@WizardlyBump17

Name and Version

root@ed00659cbc64:/app# ./llama-cli --version
load_backend: loaded SYCL backend from /app/libggml-sycl.so
load_backend: loaded CPU backend from /app/libggml-cpu-haswell.so
version: 8793 (bc05a6803)
built with IntelLLVM 2025.3.2 for Linux x86_64
root@676ea1faa4ff:/app# ./llama-cli --version
load_backend: loaded Vulkan backend from /app/libggml-vulkan.so
load_backend: loaded CPU backend from /app/libggml-cpu-haswell.so
version: 8793 (bc05a6803)
built with GNU 15.2.0 for Linux x86_64
root@f39e7db7b25c:/app# ./llama-cli --version
load_backend: loaded Vulkan backend from /app/libggml-vulkan.so
load_backend: loaded CPU backend from /app/libggml-cpu-haswell.so
version: 8772 (e9c54d557)
built with GNU 15.2.0 for Linux x86_64

Operating systems

Linux

GGML backends

SYCL

Hardware

Ryzen 7 5700X3D
B580

Models

No response

Problem description & steps to reproduce

root@f39e7db7b25c:/app# ./llama-bench --model /models/Qwen3.5-0.8B-F16.gguf --flash-attn 1 --cache-type-k turbo4 --cache-type-v turbo4
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Arc(tm) B580 Graphics (BMG G21) (Intel open-source Mesa driver) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: KHR_coopmat
load_backend: loaded Vulkan backend from /app/libggml-vulkan.so
load_backend: loaded CPU backend from /app/libggml-cpu-haswell.so
| model                          |       size |     params | backend    | ngl | type_k | type_v | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -----: | -----: | -: | --------------: | -------------------: |
/app/ggml/src/ggml-backend.cpp:809: pre-allocated tensor (cache_k_l3 (view)) in a buffer (Vulkan0) that cannot run the operation (SET_ROWS)
libggml-base.so.0(+0x18ab6) [0x7c69f00e3ab6]
libggml-base.so.0(ggml_print_backtrace+0x20d) [0x7c69f00e3f1d]
libggml-base.so.0(ggml_abort+0x166) [0x7c69f00e4106]
libggml-base.so.0(+0x31b7c) [0x7c69f00fcb7c]
libggml-base.so.0(ggml_backend_sched_split_graph+0xc8f) [0x7c69f00feb9f]
libllama.so.0(_ZN13llama_context13graph_reserveEjjjPK22llama_memory_context_ibPm+0x616) [0x7c69f0282a06]
libllama.so.0(_ZN13llama_context13sched_reserveEv+0x1000) [0x7c69f0284a40]
libllama.so.0(_ZN13llama_contextC2ERK11llama_model20llama_context_params+0xa75) [0x7c69f0286ac5]
libllama.so.0(llama_init_from_model+0x126) [0x7c69f02877c6]
./llama-bench(+0x3ab84) [0x56333c439b84]
/usr/lib/x86_64-linux-gnu/libc.so.6(+0x2a601) [0x7c69ef3b3601]
/usr/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x88) [0x7c69ef3b3718]
./llama-bench(+0x3d3d5) [0x56333c43c3d5]
Aborted                    (core dumped) ./llama-bench --model /models/Qwen3.5-0.8B-F16.gguf --flash-attn 1 --cache-type-k turbo4 --cache-type-v turbo4

Tried on both SYCL and Vulkan backends

First Bad Commit

No response

Relevant log output

Above

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions