Skip to content

Eval bug: broken vulkan on Bazzite Linux #64

@phly95

Description

@phly95

Name and Version

bazzite@bazzite:/var/home/bazzite/llama.cpp-vulkan/llama-turboquant$ ./llama-cli --version
WARNING: radv is not a conformant Vulkan implementation, testing use only.
load_backend: failed to find ggml_backend_init in /var/home/bazzite/llama.cpp-vulkan/llama-turboquant/libggml-vulkan.so
load_backend: failed to find ggml_backend_init in /var/home/bazzite/llama.cpp-vulkan/llama-turboquant/libggml-cpu.so
version: 8814 (8590cbf)
built with GNU 15.2.1 for Linux x86_64

launch script:

llama-server-launcher.sh

Operating systems

Linux

GGML backends

Vulkan

Hardware

RX 9070 XT, intel i5 14600k

Models

gemma-4-26B-A4B-it-UD-Q3_K_XL

Problem description & steps to reproduce

model doesn't load with turbo3 kv cache on vulkan. when it does load, it outputs gibberish

First Bad Commit

No response

Relevant log output

Logs
WARNING: radv is not a conformant Vulkan implementation, testing use only.
load_backend: failed to find ggml_backend_init in /var/home/bazzite/llama.cpp-vulkan/llama-turboquant/libggml-vulkan.so
load_backend: failed to find ggml_backend_init in /var/home/bazzite/llama.cpp-vulkan/llama-turboquant/libggml-cpu.so
system info: n_threads = 6, n_threads_batch = 6, total_threads = 20

system_info: n_threads = 6 (n_threads_batch = 6) / 20 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 | 

init: using 19 threads for HTTP server
load_backend: failed to find ggml_backend_init in /var/home/bazzite/llama.cpp-vulkan/llama-turboquant/libggml-vulkan.so
load_backend: failed to find ggml_backend_init in /var/home/bazzite/llama.cpp-vulkan/llama-turboquant/libggml-cpu.so
load_backend: failed to find ggml_backend_init in /var/home/bazzite/llama.cpp-vulkan/llama-turboquant/libggml-vulkan.so
load_backend: failed to find ggml_backend_init in /var/home/bazzite/llama.cpp-vulkan/llama-turboquant/libggml-cpu.so
srv   load_models: Loaded 1 cached model presets
srv   load_models: Loaded 7 local model presets from /var/home/bazzite/.lmstudio/models-flat/
srv   load_models: Available models (8) (*: custom preset)
srv   load_models:     GLM-4.7-Flash-UD-IQ3_XXS
srv   load_models:     Qwen3.5-35B-A3B-APEX-Mini-SystemAnywhere
srv   load_models:     Qwen3.5-35B-A3B-Claude-Distilled-APEX-I-Mini
srv   load_models:     gemma-4-26B-A4B-APEX-I-Mini
srv   load_models:     gemma-4-26B-A4B-heretic-APEX-I-Mini
srv   load_models:     gemma-4-26B-A4B-it-UD-Q2_K_XL
srv   load_models:     gemma-4-26B-A4B-it-UD-Q3_K_XL
srv   load_models:     unsloth/Qwen3.5-4B-GGUF:Q4_K_XL
main: starting router server, no model will be loaded in this process
start: binding port with default address family
main: router server is listening on http://0.0.0.0:8081
main: NOTE: router mode is experimental
main:       it is not recommended to use this mode in untrusted environments
srv          load: spawning server instance with name=gemma-4-26B-A4B-it-UD-Q3_K_XL on port 48203
srv          load: spawning server instance with args:
srv          load:   /var/home/bazzite/llama.cpp-vulkan/llama-turboquant/llama-server
srv          load:   --clear-idle
srv          load:   --host
srv          load:   127.0.0.1
srv          load:   --no-mmproj-offload
srv          load:   --port
srv          load:   48203
srv          load:   --alias
srv          load:   gemma-4-26B-A4B-it-UD-Q3_K_XL
srv          load:   --cache-type-k
srv          load:   turbo3
srv          load:   --cache-type-v
srv          load:   turbo3
srv          load:   --flash-attn
srv          load:   on
srv          load:   --fit-ctx
srv          load:   96000
srv          load:   --fit-target
srv          load:   400
srv          load:   --kv-unified
srv          load:   --model
srv          load:   /var/home/bazzite/.lmstudio/models-flat/gemma-4-26B-A4B-it-UD-Q3_K_XL/gemma-4-26B-A4B-it-UD-Q3_K_XL.gguf
srv          load:   --mmproj
srv          load:   /var/home/bazzite/.lmstudio/models-flat/gemma-4-26B-A4B-it-UD-Q3_K_XL/mmproj-BF16.gguf
srv          load:   --n-gpu-layers
srv          load:   all
srv          load:   --parallel
srv          load:   10
srv  log_server_r: done request: POST /models/load 127.0.0.1 200
[48203] WARNING: radv is not a conformant Vulkan implementation, testing use only.
[48203] load_backend: failed to find ggml_backend_init in /var/home/bazzite/llama.cpp-vulkan/llama-turboquant/libggml-vulkan.so
[48203] load_backend: failed to find ggml_backend_init in /var/home/bazzite/llama.cpp-vulkan/llama-turboquant/libggml-cpu.so
[48203] system info: n_threads = 6, n_threads_batch = 6, total_threads = 20
[48203] 
[48203] system_info: n_threads = 6 (n_threads_batch = 6) / 20 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 | 
[48203] 
[48203] init: using 19 threads for HTTP server
[48203] start: binding port with default address family
[48203] main: loading model
[48203] srv    load_model: loading model '/var/home/bazzite/.lmstudio/models-flat/gemma-4-26B-A4B-it-UD-Q3_K_XL/gemma-4-26B-A4B-it-UD-Q3_K_XL.gguf'
[48203] common_init_result: fitting params to device memory, for bugs during this step try to reproduce them with -fit off, or provide --verbose logs if the bug only occurs with -fit on
[48203] llama_params_fit_impl: projected to use 14504 MiB of device memory vs. 14925 MiB of free device memory
[48203] llama_params_fit_impl: will leave 421 >= 400 MiB of free device memory, no changes needed
[48203] llama_params_fit: successfully fit params to free device memory
[48203] llama_params_fit: fitting params to free memory took 1.33 seconds
[48203] llama_model_load_from_file_impl: using device Vulkan0 (AMD Radeon RX 9070 XT (RADV GFX1201)) (0000:03:00.0) - 14925 MiB free
[48203] llama_model_loader: loaded meta data with 60 key-value pairs and 658 tensors from /var/home/bazzite/.lmstudio/models-flat/gemma-4-26B-A4B-it-UD-Q3_K_XL/gemma-4-26B-A4B-it-UD-Q3_K_XL.gguf (version GGUF V3 (latest))
[48203] llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
[48203] llama_model_loader: - kv   0:                       general.architecture str              = gemma4
[48203] llama_model_loader: - kv   1:                               general.type str              = model
[48203] llama_model_loader: - kv   2:                     general.sampling.top_k i32              = 64
[48203] llama_model_loader: - kv   3:                     general.sampling.top_p f32              = 0.950000
[48203] llama_model_loader: - kv   4:                      general.sampling.temp f32              = 1.000000
[48203] llama_model_loader: - kv   5:                               general.name str              = Gemma-4-26B-A4B-It
[48203] llama_model_loader: - kv   6:                           general.finetune str              = it
[48203] llama_model_loader: - kv   7:                           general.basename str              = Gemma-4-26B-A4B-It
[48203] llama_model_loader: - kv   8:                       general.quantized_by str              = Unsloth
[48203] llama_model_loader: - kv   9:                         general.size_label str              = 26B-A4B
[48203] llama_model_loader: - kv  10:                            general.license str              = apache-2.0
[48203] llama_model_loader: - kv  11:                       general.license.link str              = https://ai.google.dev/gemma/docs/gemm...
[48203] llama_model_loader: - kv  12:                           general.repo_url str              = https://huggingface.co/unsloth
[48203] llama_model_loader: - kv  13:                   general.base_model.count u32              = 1
[48203] llama_model_loader: - kv  14:                  general.base_model.0.name str              = Gemma 4 26B A4B It
[48203] llama_model_loader: - kv  15:          general.base_model.0.organization str              = Google
[48203] llama_model_loader: - kv  16:              general.base_model.0.repo_url str              = https://huggingface.co/google/gemma-4...
[48203] llama_model_loader: - kv  17:                               general.tags arr[str,2]       = ["unsloth", "image-text-to-text"]
[48203] llama_model_loader: - kv  18:                         gemma4.block_count u32              = 30
[48203] llama_model_loader: - kv  19:                      gemma4.context_length u32              = 262144
[48203] llama_model_loader: - kv  20:                    gemma4.embedding_length u32              = 2816
[48203] llama_model_loader: - kv  21:                 gemma4.feed_forward_length u32              = 2112
[48203] llama_model_loader: - kv  22:                gemma4.attention.head_count u32              = 16
[48203] llama_model_loader: - kv  23:             gemma4.attention.head_count_kv arr[i32,30]      = [8, 8, 8, 8, 8, 2, 8, 8, 8, 8, 8, 2, ...
[48203] llama_model_loader: - kv  24:                      gemma4.rope.freq_base f32              = 1000000.000000
[48203] llama_model_loader: - kv  25:                  gemma4.rope.freq_base_swa f32              = 10000.000000
[48203] llama_model_loader: - kv  26:    gemma4.attention.layer_norm_rms_epsilon f32              = 0.000001
[48203] llama_model_loader: - kv  27:                        gemma4.expert_count u32              = 128
[48203] llama_model_loader: - kv  28:                   gemma4.expert_used_count u32              = 8
[48203] llama_model_loader: - kv  29:                gemma4.attention.key_length u32              = 512
[48203] llama_model_loader: - kv  30:              gemma4.attention.value_length u32              = 512
[48203] llama_model_loader: - kv  31:             gemma4.final_logit_softcapping f32              = 30.000000
[48203] llama_model_loader: - kv  32:            gemma4.attention.sliding_window u32              = 1024
[48203] llama_model_loader: - kv  33:          gemma4.attention.shared_kv_layers u32              = 0
[48203] llama_model_loader: - kv  34:    gemma4.embedding_length_per_layer_input u32              = 0
[48203] llama_model_loader: - kv  35:    gemma4.attention.sliding_window_pattern arr[bool,30]     = [true, true, true, true, true, false,...
[48203] llama_model_loader: - kv  36:            gemma4.attention.key_length_swa u32              = 256
[48203] llama_model_loader: - kv  37:          gemma4.attention.value_length_swa u32              = 256
[48203] llama_model_loader: - kv  38:          gemma4.expert_feed_forward_length u32              = 704
[48203] llama_model_loader: - kv  39:                gemma4.rope.dimension_count u32              = 512
[48203] llama_model_loader: - kv  40:            gemma4.rope.dimension_count_swa u32              = 256
[48203] llama_model_loader: - kv  41:                       tokenizer.ggml.model str              = gemma4
[48203] llama_model_loader: - kv  42:                      tokenizer.ggml.tokens arr[str,262144]  = ["<pad>", "<eos>", "<bos>", "<unk>", ...
[48203] llama_model_loader: - kv  43:                      tokenizer.ggml.scores arr[f32,262144]  = [-1000.000000, -1000.000000, -1000.00...
[48203] llama_model_loader: - kv  44:                  tokenizer.ggml.token_type arr[i32,262144]  = [3, 1, 3, 3, 3, 1, 1, 1, 1, 1, 1, 1, ...
[48203] llama_model_loader: - kv  45:                      tokenizer.ggml.merges arr[str,514906]  = ["\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n", ...
[48203] llama_model_loader: - kv  46:                tokenizer.ggml.bos_token_id u32              = 2
[48203] llama_model_loader: - kv  47:                tokenizer.ggml.eos_token_id u32              = 106
[48203] llama_model_loader: - kv  48:            tokenizer.ggml.unknown_token_id u32              = 3
[48203] llama_model_loader: - kv  49:            tokenizer.ggml.padding_token_id u32              = 0
[48203] llama_model_loader: - kv  50:               tokenizer.ggml.mask_token_id u32              = 4
[48203] llama_model_loader: - kv  51:                    tokenizer.chat_template str              = {%- macro format_parameters(propertie...
[48203] llama_model_loader: - kv  52:            tokenizer.ggml.add_space_prefix bool             = false
[48203] llama_model_loader: - kv  53:               tokenizer.ggml.add_bos_token bool             = true
[48203] llama_model_loader: - kv  54:               general.quantization_version u32              = 2
[48203] llama_model_loader: - kv  55:                          general.file_type u32              = 12
[48203] llama_model_loader: - kv  56:                      quantize.imatrix.file str              = gemma-4-26B-A4B-it-GGUF/imatrix_unslo...
[48203] llama_model_loader: - kv  57:                   quantize.imatrix.dataset str              = unsloth_calibration_gemma-4-26B-A4B-i...
[48203] llama_model_loader: - kv  58:             quantize.imatrix.entries_count u32              = 295
[48203] llama_model_loader: - kv  59:              quantize.imatrix.chunks_count u32              = 141
[48203] llama_model_loader: - type  f32:  392 tensors
[48203] llama_model_loader: - type q5_1:    2 tensors
[48203] llama_model_loader: - type q8_0:  206 tensors
[48203] llama_model_loader: - type iq3_xxs:   29 tensors
[48203] llama_model_loader: - type iq4_nl:   28 tensors
[48203] llama_model_loader: - type iq4_xs:    1 tensors
[48203] print_info: file format = GGUF V3 (latest)
[48203] print_info: file type   = Q3_K - Medium
[48203] print_info: file size   = 11.98 GiB (4.08 BPW) 
[48203] load: 0 unused tokens
[48203] load: control-looking token:    212 '</s>' was not control-type; this is probably a bug in the model. its type will be overridden
[48203] load: printing all EOG tokens:
[48203] load:   - 106 ('<turn|>')
[48203] load:   - 212 ('</s>')
[48203] load: special tokens cache size = 24
[48203] load: token to piece cache size = 1.9443 MB
[48203] print_info: arch                  = gemma4
[48203] print_info: vocab_only            = 0
[48203] print_info: no_alloc              = 0
[48203] print_info: n_ctx_train           = 262144
[48203] print_info: n_embd                = 2816
[48203] print_info: n_embd_inp            = 2816
[48203] print_info: n_layer               = 30
[48203] print_info: n_head                = 16
[48203] print_info: n_head_kv             = [8, 8, 8, 8, 8, 2, 8, 8, 8, 8, 8, 2, 8, 8, 8, 8, 8, 2, 8, 8, 8, 8, 8, 2, 8, 8, 8, 8, 8, 2]
[48203] print_info: n_rot                 = 512
[48203] print_info: n_swa                 = 1024
[48203] print_info: is_swa_any            = 1
[48203] print_info: n_embd_head_k         = 512
[48203] print_info: n_embd_head_v         = 512
[48203] print_info: n_gqa                 = [2, 2, 2, 2, 2, 8, 2, 2, 2, 2, 2, 8, 2, 2, 2, 2, 2, 8, 2, 2, 2, 2, 2, 8, 2, 2, 2, 2, 2, 8]
[48203] print_info: n_embd_k_gqa          = [2048, 2048, 2048, 2048, 2048, 1024, 2048, 2048, 2048, 2048, 2048, 1024, 2048, 2048, 2048, 2048, 2048, 1024, 2048, 2048, 2048, 2048, 2048, 1024, 2048, 2048, 2048, 2048, 2048, 1024]
[48203] print_info: n_embd_v_gqa          = [2048, 2048, 2048, 2048, 2048, 1024, 2048, 2048, 2048, 2048, 2048, 1024, 2048, 2048, 2048, 2048, 2048, 1024, 2048, 2048, 2048, 2048, 2048, 1024, 2048, 2048, 2048, 2048, 2048, 1024]
[48203] print_info: f_norm_eps            = 0.0e+00
[48203] print_info: f_norm_rms_eps        = 1.0e-06
[48203] print_info: f_clamp_kqv           = 0.0e+00
[48203] print_info: f_max_alibi_bias      = 0.0e+00
[48203] print_info: f_logit_scale         = 0.0e+00
[48203] print_info: f_attn_scale          = 1.0e+00
[48203] print_info: n_ff                  = 2112
[48203] print_info: n_expert              = 128
[48203] print_info: n_expert_used         = 8
[48203] print_info: n_expert_groups       = 0
[48203] print_info: n_group_used          = 0
[48203] print_info: causal attn           = 1
[48203] print_info: pooling type          = -1
[48203] print_info: rope type             = 2
[48203] print_info: rope scaling          = linear
[48203] print_info: freq_base_train       = 1000000.0
[48203] print_info: freq_scale_train      = 1
[48203] print_info: freq_base_swa         = 10000.0
[48203] print_info: freq_scale_swa        = 1
[48203] print_info: n_embd_head_k_swa     = 256
[48203] print_info: n_embd_head_v_swa     = 256
[48203] print_info: n_rot_swa             = 256
[48203] print_info: n_ctx_orig_yarn       = 262144
[48203] print_info: rope_yarn_log_mul     = 0.0000
[48203] print_info: rope_finetuned        = unknown
[48203] print_info: model type            = ?B
[48203] print_info: model params          = 25.23 B
[48203] print_info: general.name          = Gemma-4-26B-A4B-It
[48203] print_info: vocab type            = BPE
[48203] print_info: n_vocab               = 262144
[48203] print_info: n_merges              = 514906
[48203] print_info: BOS token             = 2 '<bos>'
[48203] print_info: EOS token             = 106 '<turn|>'
[48203] print_info: UNK token             = 3 '<unk>'
[48203] print_info: PAD token             = 0 '<pad>'
[48203] print_info: MASK token            = 4 '<mask>'
[48203] print_info: LF token              = 107 '
[48203] '
[48203] print_info: EOG token             = 106 '<turn|>'
[48203] print_info: EOG token             = 212 '</s>'
[48203] print_info: max token length      = 93
[48203] load_tensors: loading model tensors, this can take a while... (mmap = true, direct_io = false)
[48203] str: cannot properly format tensor name output with suffix=weight bid=-1 xid=-1
[48203] load_tensors: offloading output layer to GPU
[48203] load_tensors: offloading 29 repeating layers to GPU
[48203] load_tensors: offloaded 31/31 layers to GPU
[48203] load_tensors:   CPU_Mapped model buffer size =   748.00 MiB
[48203] load_tensors:      Vulkan0 model buffer size = 12264.00 MiB
[48203] .............................................................................
[48203] common_init_result: added <turn|> logit bias = -inf
[48203] common_init_result: added </s> logit bias = -inf
[48203] llama_context: constructing llama_context
[48203] llama_context: n_seq_max     = 10
[48203] llama_context: n_ctx         = 262144
[48203] llama_context: n_ctx_seq     = 262144
[48203] llama_context: n_batch       = 2048
[48203] llama_context: n_ubatch      = 512
[48203] llama_context: causal_attn   = 1
[48203] llama_context: flash_attn    = enabled
[48203] llama_context: kv_unified    = true
[48203] llama_context: freq_base     = 1000000.0
[48203] llama_context: freq_scale    = 1
[48203] llama_context: Vulkan_Host  output buffer size =    10.00 MiB
[48203] llama_kv_cache_iswa: creating non-SWA KV cache, size = 262144 cells
[48203] llama_kv_cache:    Vulkan0 KV buffer size =  1000.13 MiB
[48203] llama_kv_cache: TurboQuant rotation matrices initialized (128x128)
[48203] llama_kv_cache: size = 1000.00 MiB (262144 cells,   5 layers, 10/1 seqs), K (turbo3):  500.00 MiB, V (turbo3):  500.00 MiB
[48203] llama_kv_cache: upstream attention rotation disabled (TurboQuant uses kernel-level WHT)
[48203] llama_kv_cache: attn_rot_k = 0
[48203] llama_kv_cache: attn_rot_v = 0
[48203] llama_kv_cache_iswa: creating     SWA KV cache, size = 10752 cells
[48203] llama_kv_cache:    Vulkan0 KV buffer size =   410.28 MiB
[48203] llama_kv_cache: TurboQuant rotation matrices initialized (128x128)
[48203] llama_kv_cache: size =  410.16 MiB ( 10752 cells,  25 layers, 10/1 seqs), K (turbo3):  205.08 MiB, V (turbo3):  205.08 MiB
[48203] llama_kv_cache: upstream attention rotation disabled (TurboQuant uses kernel-level WHT)
[48203] llama_kv_cache: attn_rot_k = 0
[48203] llama_kv_cache: attn_rot_v = 0
[48203] sched_reserve: reserving ...
[48203] sched_reserve: resolving fused Gated Delta Net support:
[48203] sched_reserve: fused Gated Delta Net (autoregressive) enabled
[48203] sched_reserve: fused Gated Delta Net (chunked) enabled
[48203] sched_reserve:    Vulkan0 compute buffer size =   829.79 MiB
[48203] sched_reserve: Vulkan_Host compute buffer size =   552.54 MiB
[48203] sched_reserve: graph nodes  = 2707
[48203] sched_reserve: graph splits = 2
[48203] sched_reserve: reserve took 70.20 ms, sched copies = 1
[48203] common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
[48203] /home/bazzite/llama-cpp-turboquant/ggml/src/ggml-vulkan/ggml-vulkan.cpp:8972: GGML_ASSERT(Br == pipeline->wg_denoms[0]) failed
[48203] [New LWP 931335]
[48203] [New LWP 931334]
[48203] [New LWP 931333]
[48203] [New LWP 931332]
[48203] [New LWP 931331]
[48203] [New LWP 931330]
[48203] [New LWP 931329]
[48203] [New LWP 931328]
[48203] [New LWP 931327]
[48203] [New LWP 931326]
[48203] [New LWP 931325]
[48203] [New LWP 931324]
[48203] [New LWP 931323]
[48203] [New LWP 931322]
[48203] [New LWP 931321]
[48203] [New LWP 931320]
[48203] [New LWP 931319]
[48203] [New LWP 931318]
[48203] [New LWP 931317]
[48203] [New LWP 931316]
[48203] [New LWP 931315]
[48203] [New LWP 931313]
[48203] [New LWP 931312]
[48203] 
[48203] This GDB supports auto-downloading debuginfo from the following URLs:
[48203]   <ima:enforcing>
[48203]   <https://debuginfod.fedoraproject.org/>
[48203]   <ima:ignore>
[48203] Enable debuginfod for this session? (y or [n]) [answered N; input not from terminal]
[48203] Debuginfod has been disabled.
[48203] To make this setting permanent, add 'set debuginfod enabled off' to .gdbinit.
[48203] [Thread debugging using libthread_db enabled]
[48203] Using host libthread_db library "/lib64/libthread_db.so.1".
[48203] 0x00007f77aa2879a2 in __syscall_cancel_arch () from /lib64/libc.so.6
[48203] #0  0x00007f77aa2879a2 in __syscall_cancel_arch () from /lib64/libc.so.6
[48203] #1  0x00007f77aa27bc3c in __internal_syscall_cancel () from /lib64/libc.so.6
[48203] #2  0x00007f77aa27bc84 in __syscall_cancel () from /lib64/libc.so.6
[48203] #3  0x00007f77aa2ebb8f in wait4 () from /lib64/libc.so.6
[48203] #4  0x00007f77ab0d49bd in ggml_print_backtrace () from /var/home/bazzite/llama.cpp-vulkan/llama-turboquant/libggml-base.so.0
[48203] #5  0x00007f77ab0d4b46 in ggml_abort () from /var/home/bazzite/llama.cpp-vulkan/llama-turboquant/libggml-base.so.0
[48203] #6  0x00007f77a6916d2a in ggml_vk_flash_attn(ggml_backend_vk_context*, std::shared_ptr<vk_context_struct>&, ggml_tensor const*, ggml_tensor const*, ggml_tensor const*, ggml_tensor const*, ggml_tensor const*, ggml_tensor*) () from /var/home/bazzite/llama.cpp-vulkan/llama-turboquant/libggml-vulkan.so.0
[48203] #7  0x00007f77a692a9fa in ggml_vk_build_graph(ggml_backend_vk_context*, ggml_cgraph*, int, ggml_tensor*, int, bool, bool, bool) () from /var/home/bazzite/llama.cpp-vulkan/llama-turboquant/libggml-vulkan.so.0
[48203] #8  0x00007f77a69378e9 in ggml_backend_vk_graph_compute(ggml_backend*, ggml_cgraph*) () from /var/home/bazzite/llama.cpp-vulkan/llama-turboquant/libggml-vulkan.so.0
[48203] #9  0x00007f77ab0f011b in ggml_backend_graph_compute_async () from /var/home/bazzite/llama.cpp-vulkan/llama-turboquant/libggml-base.so.0
[48203] #10 0x00007f77ab0f4c62 in ggml_backend_sched_compute_splits(ggml_backend_sched*) () from /var/home/bazzite/llama.cpp-vulkan/llama-turboquant/libggml-base.so.0
[48203] #11 0x00007f77ab0f5aa0 in ggml_backend_sched_graph_compute_async () from /var/home/bazzite/llama.cpp-vulkan/llama-turboquant/libggml-base.so.0
[48203] #12 0x00007f77aa8a0a42 in llama_context::graph_compute(ggml_cgraph*, bool) () from /var/home/bazzite/llama.cpp-vulkan/llama-turboquant/libllama.so.0
[48203] #13 0x00007f77aa89c139 in llama_context::process_ubatch(llama_ubatch const&, llm_graph_type, llama_memory_context_i*, ggml_status&) () from /var/home/bazzite/llama.cpp-vulkan/llama-turboquant/libllama.so.0
[48203] #14 0x00007f77aa89de47 in llama_context::decode(llama_batch const&) () from /var/home/bazzite/llama.cpp-vulkan/llama-turboquant/libllama.so.0
[48203] #15 0x00007f77aa8a4d81 in llama_decode () from /var/home/bazzite/llama.cpp-vulkan/llama-turboquant/libllama.so.0
[48203] #16 0x0000000000673767 in common_init_from_params(common_params&) ()
[48203] #17 0x00000000004f3e06 in server_context_impl::load_model(common_params const&) ()
[48203] #18 0x00000000004cf00c in server_context::load_model(common_params const&) ()
[48203] #19 0x0000000000408427 in main ()
[48203] [Inferior 1 (process 931308) detached]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions