Conversation
Crashes on the GPU because of rms_norm requiring ne0 to be multiple of warp_size. Runs on the CPU, but produces garbage.
|
I saw discussion that in kobold.cpp the model did much more poorly when given a more limited token budget for the images. Maybe this doesn't apply here? The numbers they were talking about were 2xx being too small. |
Well, from the testing I have done, |
|
They are all working, but it was hinted to increase the numbers to get more detailed descriptions. No way to do that besides edit the code currently. Testing here, model response to the pics was more elaborate in mainline thus far. But that could be sampling/the parser or any number of things. Description in the reasoning was very brief. Final output ignored the image unless I mentioned it. I'll keep playing with it. edit: we do have --image-max-tokens here and also min. |
|
It works for me most of the time. And when it does, the output seems fine / similar to llama.cpp. All junk / special tokens: |
Based on quick tests it seems to work, but you tell me.