common/gemma4 : handle parsing edge cases#21760
common/gemma4 : handle parsing edge cases#21760aldehir wants to merge 5 commits intoggml-org:masterfrom
Conversation
|
Made a discussion but you already working on a fix. Thanks! This is what I'm hitting quite often: Is it the same issue? |
|
@En3Tho this one might be related to the prompt issue, but I'll add it just in case. |
|
@Dampfinchen it should for the |
It was the second update I believe. Recently the third update released with the latest chat template. Since its an older quant, I have downloaded the new chat template from Google and injected it by using --chat-template-file. I am also running an up to date llama.cpp build that features the fixes from PR #21704 The exact lcpp command I have used:
Please note I did not download and test this PR yet, I simply thought it was a good idea to share it in the case that this would be one of these edge cases that can be improved. In Hermes Agent, it seems to work fine usually but these issues appeared after I have worked with the agent for a little while. At one point (it was around 64K of context) I noticed it was stuck, endlessly generating so I have stopped the Agent which lead to the print of the strange generation that has you worried. |


Overview
Fix a few edge cases for Gemma 4 26B A4B. I don't see these artifacts from the 31B variant.
Additional information
Issue 1
If the model generates content + tool call, the template will incorrectly format the prompt without the generation prompt (
<|turn>model\n):Causing 26B to produce a broken thinking sequence:
Instead of
This is fixed by adding the generation prompt if not present and the prompt ends with
<turn|>\n.Issue 2
Occasionally 26B will emit a trailing
<channel|>, particularly when it does not reason but produces a content message before a tool call:Fixed by scanning until
<channel|>, then consume until<|tool_call>or end.Issue 3
At the start of the generation, 26B may emit multiple
<|channel>tokens.Unsure if this is related to the bad prompt above, but it's easy enough to handle by consuming all
<|channel>tokens that do not precedethought.Requirements