Skip to content

Vision support for Gemma4#1635

Merged
ikawrakow merged 5 commits intomainfrom
ik/gemma4_vision
Apr 16, 2026
Merged

Vision support for Gemma4#1635
ikawrakow merged 5 commits intomainfrom
ik/gemma4_vision

Conversation

@ikawrakow
Copy link
Copy Markdown
Owner

Based on quick tests it seems to work, but you tell me.

Crashes on the GPU because of rms_norm requiring ne0 to be multiple
of warp_size.
Runs on the CPU, but produces garbage.
@Ph0rk0z
Copy link
Copy Markdown

Ph0rk0z commented Apr 14, 2026

I saw discussion that in kobold.cpp the model did much more poorly when given a more limited token budget for the images. Maybe this doesn't apply here? The numbers they were talking about were 2xx being too small.

@ikawrakow
Copy link
Copy Markdown
Owner Author

I saw discussion that in kobold.cpp the model did much more poorly when given a more limited token budget for the images. Maybe this doesn't apply here? The numbers they were talking about were 2xx being too small.

Well, from the testing I have done, ik_llama.cpp produces the exact same number of image tokens as llama.cpp. The number of image tokens is 2XX, so we can conclude that neither this implementation nor the llama.cpp implementation are working, if kobold.cpp wisdom is to be trusted.

@Ph0rk0z
Copy link
Copy Markdown

Ph0rk0z commented Apr 14, 2026

They are all working, but it was hinted to increase the numbers to get more detailed descriptions. No way to do that besides edit the code currently.

Testing here, model response to the pics was more elaborate in mainline thus far. But that could be sampling/the parser or any number of things. Description in the reasoning was very brief. Final output ignored the image unless I mentioned it. I'll keep playing with it.

edit: we do have --image-max-tokens here and also min.

@gapeleon
Copy link
Copy Markdown
Contributor

It works for me most of the time. And when it does, the output seems fine / similar to llama.cpp.
31b Q8_0 with F16 mmproj.
Sometimes it crashes though, usually (but not always) when responding after there's an image in the context, or when OpenWebUI sends off the title generation prompt:

processing image...
encoding image slice...
image slice encoded in 179 ms
decoding image batch 1/1, n_tokens_batch = 266
image decoded (batch 1/1) in 183 ms
image processed in 362 ms
=============================== Failed to sample token
Data has been stored in probabilities.txt
Create an issue with full log and attach probabilities.txt to the issue


Crashing now
/home/gapeleon/apps/ik_llama.cpp_gemma4-vision/src/llama-sampling.cpp:744: Fatal error
cat probabilities.txt 
candidates->size: 40
max  = nan
sump = nan
r    = 3320178168
probabilities:
0  38  nan  nan
1  22  nan  nan
2  10  nan  nan
3  34  nan  nan
4  26  nan  nan
5  18  nan  nan
6  20  nan  nan
7  4  nan  nan
8  24  nan  nan
9  12  nan  nan
10  32  nan  nan
11  28  nan  nan
12  16  nan  nan
13  36  nan  nan
14  8  nan  nan
15  39  nan  nan
16  9  nan  nan
17  21  nan  nan
18  1  nan  nan
19  23  nan  nan
20  11  nan  nan
21  25  nan  nan
22  5  nan  nan
23  27  nan  nan
24  13  nan  nan
25  31  nan  nan
26  29  nan  nan
27  15  nan  nan
28  33  nan  nan
29  7  nan  nan
30  35  nan  nan
31  17  nan  nan
32  37  nan  nan
33  3  nan  nan
34  19  nan  nan
35  0  nan  nan
36  2  nan  nan
37  6  nan  nan
38  14  nan  nan
39  30  nan  nan

All junk / special tokens:

rank  0 | token_id 38 | '<unused32>'
rank  1 | token_id 22 | '<unused16>'
rank  2 | token_id 10 | '<unused4>'
rank  3 | token_id 34 | '<unused28>'
rank  4 | token_id 26 | '<unused20>'
rank  5 | token_id 18 | '<unused12>'
rank  6 | token_id 20 | '<unused14>'
rank  7 | token_id  4 | '<mask>'

@ikawrakow ikawrakow merged commit eaf8386 into main Apr 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants