Skip to content

common/gemma4 : handle parsing edge cases#21760

Draft
aldehir wants to merge 5 commits intoggml-org:masterfrom
aldehir:gemma4-more-fixes
Draft

common/gemma4 : handle parsing edge cases#21760
aldehir wants to merge 5 commits intoggml-org:masterfrom
aldehir:gemma4-more-fixes

Conversation

@aldehir
Copy link
Copy Markdown
Contributor

@aldehir aldehir commented Apr 11, 2026

Overview

Fix a few edge cases for Gemma 4 26B A4B. I don't see these artifacts from the 31B variant.

Additional information

Issue 1

If the model generates content + tool call, the template will incorrectly format the prompt without the generation prompt (<|turn>model\n):

...<|tool_call>call:$call<tool_call|><|tool_response>response:$response<tool_response|>$message<turn|>\n

Causing 26B to produce a broken thinking sequence:

thought\n<channel|>

Instead of

<|channel>thought\n<channel|>

This is fixed by adding the generation prompt if not present and the prompt ends with <turn|>\n.

Issue 2

Occasionally 26B will emit a trailing <channel|>, particularly when it does not reason but produces a content message before a tool call:

<|channel>thought\n<channel|>I will ...<channel|><|tool_call>`

Fixed by scanning until <channel|>, then consume until <|tool_call> or end.

Issue 3

At the start of the generation, 26B may emit multiple <|channel> tokens.

<|channel><|channel>thought\nI will...

Unsure if this is related to the bad prompt above, but it's easy enough to handle by consuming all <|channel> tokens that do not precede thought.

Requirements

@aldehir aldehir requested a review from a team as a code owner April 11, 2026 07:22
@aldehir aldehir requested a review from pwilkin as a code owner April 11, 2026 07:31
@github-actions github-actions bot added the testing Everything test related label Apr 11, 2026
@En3Tho
Copy link
Copy Markdown

En3Tho commented Apr 11, 2026

#21767

Made a discussion but you already working on a fix. Thanks!

This is what I'm hitting quite often:
<|channel><|channel>thought <channel|><|tool_call>call:tool_name{..tool_args}<tool_call|>

Is it the same issue?

@aldehir
Copy link
Copy Markdown
Contributor Author

aldehir commented Apr 11, 2026

@En3Tho this one might be related to the prompt issue, but I'll add it just in case.

@aldehir aldehir requested a review from ggerganov as a code owner April 11, 2026 19:39
@Dampfinchen
Copy link
Copy Markdown

Dampfinchen commented Apr 11, 2026

Encountered these in Hermes Agent. I'm not sure if this is a lcpp or hermes issue, but it doesn't hurt to post it here.

Screenshot 2026-04-11 225147 Screenshot 2026-04-11 215539

Maybe this PR fixes that as well?

@aldehir
Copy link
Copy Markdown
Contributor Author

aldehir commented Apr 11, 2026

@Dampfinchen it should for the thought\n<channel|> issues, but that first generation has me worried. Which quants are you using? I want to see if I can reproduce something similar.

@Dampfinchen
Copy link
Copy Markdown

Dampfinchen commented Apr 11, 2026

@Dampfinchen it should for the thought\n<channel|> issues, but that first generation has me worried. Which quants are you using? I want to see if I can reproduce something similar.

I was using https://huggingface.co/bartowski/google_gemma-4-26B-A4B-it-GGUF/commit/42d40426322efc9bdd03f9f32b9fd87bfc63409f

It was the second update I believe. Recently the third update released with the latest chat template.

Since its an older quant, I have downloaded the new chat template from Google and injected it by using --chat-template-file. I am also running an up to date llama.cpp build that features the fixes from PR #21704

The exact lcpp command I have used:

./llama-server -m "google_gemma-4-26B-A4B-it-Q4_K_M.gguf" -c 102144 -fa 1 --host 0.0.0.0 --port 5001 --jinja -ngl 99 --n-cpu-moe 28 -ctv q8_0 -ctk q8_0 -ub 1024 --mmproj "gemma-4-26B-A4B-it.mmproj-q8_0.gguf" --no-mmproj-offload --ctx-checkpoints 6 --no-mmap --chat-template-file "Gemma4\chat_template-google-26b.jinja" --reasoning 0 -np 1 --temp 0.3 --top-p 0.9 --min-p 0.1 --top-k 20

Please note I did not download and test this PR yet, I simply thought it was a good idea to share it in the case that this would be one of these edge cases that can be improved.

In Hermes Agent, it seems to work fine usually but these issues appeared after I have worked with the agent for a little while. At one point (it was around 64K of context) I noticed it was stuck, endlessly generating so I have stopped the Agent which lead to the print of the strange generation that has you worried.

@aldehir aldehir marked this pull request as draft April 11, 2026 22:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

testing Everything test related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants