Eval bug: Gemma 4 fails to load with tensor shape mismatch due to sliding_window_pattern read as uint32_t instead of bool

### Name and Version

version: b8661 (b7ad48e)
built with MSVC 19.44.35211.0 for Windows x86_64
CMake options: GGML_CUDA=ON, GGML_CUDA_FA=ON, GGML_CUDA_GRAPHS=ON, BUILD_SHARED_LIBS=ON, LLAMA_OPENSSL=ON
CUDA 12.9

### Operating systems

Windows

### GGML backends

CUDA

### Hardware

AMD Ryzen 7 7800X3D + NVIDIA GeForce RTX 4070 (12GB VRAM)

### Models

unsloth/gemma-4-26B-A4B-it-GGUF (UD-Q4_K_M)
https://huggingface.co/unsloth/gemma-4-26B-A4B-it-GGUF

### Problem description & steps to reproduce

## Summary
Attempting to load a Gemma 4 model (e.g., `gemma-4-26B-A4B-it-GGUF`) with a self-compiled build from source fails with a tensor shape mismatch error:
```
llama_model_load: error loading model: check_tensor_dims: tensor 'blk.1.attn_q.weight' has wrong shape; expected 2816, 8192, got 2816, 4096
```
However, the same model loads and works correctly in LM Studio v2.11.0, which uses a forked version of llama.cpp.

## Root Cause
The GGUF file stores `gemma4.attention.sliding_window_pattern` as a `bool[]` array (30 entries). The llama.cpp loader reads this into `std::array<uint32_t, LLAMA_MAX_LAYERS> swa_layers` via `get_key_or_arr`. Due to the type mismatch between `bool` (1 byte) and `uint32_t` (4 bytes), the memory layout is misinterpreted, causing incorrect `is_swa()` results for most layers. This leads to wrong dimension calculations for `n_embd_head_k` and `n_embd_k_gqa`, resulting in the tensor shape mismatch.

## Fix
Changing `swa_layers` from `std::array<uint32_t, LLAMA_MAX_LAYERS>` to `std::array<bool, LLAMA_MAX_LAYERS>` and adding the necessary `get_arr<bool, 512>` template instantiation resolves the issue completely. The model then loads correctly, producing the same `n_embd_k_gqa` values as LM Studio.

### First Bad Commit

_No response_

### Relevant log output

## Error log (before fix)
```console
$ llama-server -m gemma-4-26B-A4B-it-UD-Q4_K_M.gguf --ctx-size 8192 --n-gpu-layers 30 --flash-attn on --cpu-moe
print_info: arch                  = gemma4
print_info: n_embd_k_gqa          = [2048, 4096, 2048, 2048, 2048, 512, 2048, 4096, ...]
print_info: n_embd_v_gqa          = [2048, 4096, 2048, 2048, 2048, 512, 2048, 4096, ...]
load_tensors: loading model tensors, this can take a while... (mmap = true, direct_io = false)
llama_model_load: error loading model: check_tensor_dims: tensor 'blk.1.attn_q.weight' has wrong shape; expected 2816, 8192, got 2816, 4096
```

## Expected output (after fix)
```console
print_info: n_embd_k_gqa          = [2048, 2048, 2048, 2048, 2048, 1024, 2048, 2048, ...]
print_info: n_embd_v_gqa          = [2048, 2048, 2048, 2048, 2048, 1024, 2048, 2048, ...]
load_tensors: ............................................................
main: model loaded
srv  update_slots: all slots are idle
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval bug: Gemma 4 fails to load with tensor shape mismatch due to sliding_window_pattern read as uint32_t instead of bool #21434

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

Summary

Root Cause

Fix

First Bad Commit

Relevant log output

Error log (before fix)

Expected output (after fix)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Eval bug: Gemma 4 fails to load with tensor shape mismatch due to sliding_window_pattern read as uint32_t instead of bool #21434

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

Summary

Root Cause

Fix

First Bad Commit

Relevant log output

Error log (before fix)

Expected output (after fix)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions