Vision support for Gemma4 by ikawrakow · Pull Request #1635 · ikawrakow/ik_llama.cpp

ikawrakow · 2026-04-14T12:14:25Z

Based on quick tests it seems to work, but you tell me.

Crashes on the GPU because of rms_norm requiring ne0 to be multiple of warp_size. Runs on the CPU, but produces garbage.

Ph0rk0z · 2026-04-14T15:31:30Z

I saw discussion that in kobold.cpp the model did much more poorly when given a more limited token budget for the images. Maybe this doesn't apply here? The numbers they were talking about were 2xx being too small.

ikawrakow · 2026-04-14T16:47:35Z

I saw discussion that in kobold.cpp the model did much more poorly when given a more limited token budget for the images. Maybe this doesn't apply here? The numbers they were talking about were 2xx being too small.

Well, from the testing I have done, ik_llama.cpp produces the exact same number of image tokens as llama.cpp. The number of image tokens is 2XX, so we can conclude that neither this implementation nor the llama.cpp implementation are working, if kobold.cpp wisdom is to be trusted.

Ph0rk0z · 2026-04-14T17:08:00Z

They are all working, but it was hinted to increase the numbers to get more detailed descriptions. No way to do that besides edit the code currently.

Testing here, model response to the pics was more elaborate in mainline thus far. But that could be sampling/the parser or any number of things. Description in the reasoning was very brief. Final output ignored the image unless I mentioned it. I'll keep playing with it.

edit: we do have --image-max-tokens here and also min.

gapeleon · 2026-04-15T03:12:01Z

It works for me most of the time. And when it does, the output seems fine / similar to llama.cpp.
31b Q8_0 with F16 mmproj.
Sometimes it crashes though, usually (but not always) when responding after there's an image in the context, or when OpenWebUI sends off the title generation prompt:

processing image...
encoding image slice...
image slice encoded in 179 ms
decoding image batch 1/1, n_tokens_batch = 266
image decoded (batch 1/1) in 183 ms
image processed in 362 ms
=============================== Failed to sample token
Data has been stored in probabilities.txt
Create an issue with full log and attach probabilities.txt to the issue


Crashing now
/home/gapeleon/apps/ik_llama.cpp_gemma4-vision/src/llama-sampling.cpp:744: Fatal error

cat probabilities.txt 
candidates->size: 40
max  = nan
sump = nan
r    = 3320178168
probabilities:
0  38  nan  nan
1  22  nan  nan
2  10  nan  nan
3  34  nan  nan
4  26  nan  nan
5  18  nan  nan
6  20  nan  nan
7  4  nan  nan
8  24  nan  nan
9  12  nan  nan
10  32  nan  nan
11  28  nan  nan
12  16  nan  nan
13  36  nan  nan
14  8  nan  nan
15  39  nan  nan
16  9  nan  nan
17  21  nan  nan
18  1  nan  nan
19  23  nan  nan
20  11  nan  nan
21  25  nan  nan
22  5  nan  nan
23  27  nan  nan
24  13  nan  nan
25  31  nan  nan
26  29  nan  nan
27  15  nan  nan
28  33  nan  nan
29  7  nan  nan
30  35  nan  nan
31  17  nan  nan
32  37  nan  nan
33  3  nan  nan
34  19  nan  nan
35  0  nan  nan
36  2  nan  nan
37  6  nan  nan
38  14  nan  nan
39  30  nan  nan

All junk / special tokens:

rank  0 | token_id 38 | '<unused32>'
rank  1 | token_id 22 | '<unused16>'
rank  2 | token_id 10 | '<unused4>'
rank  3 | token_id 34 | '<unused28>'
rank  4 | token_id 26 | '<unused20>'
rank  5 | token_id 18 | '<unused12>'
rank  6 | token_id 20 | '<unused14>'
rank  7 | token_id  4 | '<mask>'

ikawrakow added 5 commits April 14, 2026 11:28

WIP: Gemma4 vision

c51234d

Crashes on the GPU because of rms_norm requiring ne0 to be multiple of warp_size. Runs on the CPU, but produces garbage.

Remove unnecessary assert in CUDA rms_norm

aac1cb6

GLU was not advertised as supported on CUDA

6bb40a4

Still not working

21a00f3

This seems to work

2b30f76

ikawrakow merged commit eaf8386 into main Apr 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vision support for Gemma4#1635

Vision support for Gemma4#1635
ikawrakow merged 5 commits intomainfrom
ik/gemma4_vision

ikawrakow commented Apr 14, 2026

Uh oh!

Ph0rk0z commented Apr 14, 2026

Uh oh!

ikawrakow commented Apr 14, 2026

Uh oh!

Ph0rk0z commented Apr 14, 2026 •

edited

Loading

Uh oh!

gapeleon commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ikawrakow commented Apr 14, 2026

Uh oh!

Ph0rk0z commented Apr 14, 2026

Uh oh!

ikawrakow commented Apr 14, 2026

Uh oh!

Ph0rk0z commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gapeleon commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Ph0rk0z commented Apr 14, 2026 •

edited

Loading