common/gemma4 : handle parsing edge cases by aldehir · Pull Request #21760 · ggml-org/llama.cpp

aldehir · 2026-04-11T07:22:26Z

Overview

Fix a few edge cases for Gemma 4 26B A4B. I don't see these artifacts from the 31B variant.

Additional information

Issue 1

If the model generates content + tool call, the template will incorrectly format the prompt without the generation prompt (<|turn>model\n):

...<|tool_call>call:$call<tool_call|><|tool_response>response:$response<tool_response|>$message<turn|>\n

Causing 26B to produce a broken thinking sequence:

thought\n<channel|>

Instead of

<|channel>thought\n<channel|>

This is fixed by adding the generation prompt if not present and the prompt ends with <turn|>\n.

Issue 2

Occasionally 26B will emit a trailing <channel|>, particularly when it does not reason but produces a content message before a tool call:

<|channel>thought\n<channel|>I will ...<channel|><|tool_call>`

Fixed by scanning until <channel|>, then consume until <|tool_call> or end.

Issue 3

At the start of the generation, 26B may emit multiple <|channel> tokens.

<|channel><|channel>thought\nI will...

Unsure if this is related to the bad prompt above, but it's easy enough to handle by consuming all <|channel> tokens that do not precede thought.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: NO

En3Tho · 2026-04-11T15:10:38Z

#21767

Made a discussion but you already working on a fix. Thanks!

This is what I'm hitting quite often:
<|channel><|channel>thought <channel|><|tool_call>call:tool_name{..tool_args}<tool_call|>

Is it the same issue?

aldehir · 2026-04-11T15:36:17Z

@En3Tho this one might be related to the prompt issue, but I'll add it just in case.

Dampfinchen · 2026-04-11T21:11:36Z

Encountered these in Hermes Agent. I'm not sure if this is a lcpp or hermes issue, but it doesn't hurt to post it here.

Maybe this PR fixes that as well?

aldehir · 2026-04-11T21:38:52Z

@Dampfinchen it should for the thought\n<channel|> issues, but that first generation has me worried. Which quants are you using? I want to see if I can reproduce something similar.

Dampfinchen · 2026-04-11T21:54:52Z

@Dampfinchen it should for the thought\n<channel|> issues, but that first generation has me worried. Which quants are you using? I want to see if I can reproduce something similar.

I was using https://huggingface.co/bartowski/google_gemma-4-26B-A4B-it-GGUF/commit/42d40426322efc9bdd03f9f32b9fd87bfc63409f

It was the second update I believe. Recently the third update released with the latest chat template.

Since its an older quant, I have downloaded the new chat template from Google and injected it by using --chat-template-file. I am also running an up to date llama.cpp build that features the fixes from PR #21704

The exact lcpp command I have used:

./llama-server -m "google_gemma-4-26B-A4B-it-Q4_K_M.gguf" -c 102144 -fa 1 --host 0.0.0.0 --port 5001 --jinja -ngl 99 --n-cpu-moe 28 -ctv q8_0 -ctk q8_0 -ub 1024 --mmproj "gemma-4-26B-A4B-it.mmproj-q8_0.gguf" --no-mmproj-offload --ctx-checkpoints 6 --no-mmap --chat-template-file "Gemma4\chat_template-google-26b.jinja" --reasoning 0 -np 1 --temp 0.3 --top-p 0.9 --min-p 0.1 --top-k 20

Please note I did not download and test this PR yet, I simply thought it was a good idea to share it in the case that this would be one of these edge cases that can be improved.

In Hermes Agent, it seems to work fine usually but these issues appeared after I have worked with the agent for a little while. At one point (it was around 64K of context) I noticed it was stuck, endlessly generating so I have stopped the Agent which lead to the print of the strange generation that has you worried.

common/gemma4 : handle parsing edge cases

8690ebf

aldehir requested a review from a team as a code owner April 11, 2026 07:22

cont : add edge case tests

e6d3b55

aldehir requested a review from pwilkin as a code owner April 11, 2026 07:31

github-actions bot added the testing Everything test related label Apr 11, 2026

aldehir added 3 commits April 11, 2026 14:26

cont : add gbnf parser to override generated rules

c5774ed

cont : consume empty <|channel> tokens at the start

91dffdc

cont : remove silent parser

fab8e8b

aldehir requested a review from ggerganov as a code owner April 11, 2026 19:39

aldehir marked this pull request as draft April 11, 2026 22:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

common/gemma4 : handle parsing edge cases#21760

common/gemma4 : handle parsing edge cases#21760
aldehir wants to merge 5 commits intoggml-org:masterfrom
aldehir:gemma4-more-fixes

aldehir commented Apr 11, 2026 •

edited

Loading

Uh oh!

En3Tho commented Apr 11, 2026 •

edited

Loading

Uh oh!

aldehir commented Apr 11, 2026

Uh oh!

Dampfinchen commented Apr 11, 2026 •

edited

Loading

Uh oh!

aldehir commented Apr 11, 2026

Uh oh!

Dampfinchen commented Apr 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

aldehir commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Additional information

Issue 1

Issue 2

Issue 3

Requirements

Uh oh!

En3Tho commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aldehir commented Apr 11, 2026

Uh oh!

Dampfinchen commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aldehir commented Apr 11, 2026

Uh oh!

Dampfinchen commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

aldehir commented Apr 11, 2026 •

edited

Loading

En3Tho commented Apr 11, 2026 •

edited

Loading

Dampfinchen commented Apr 11, 2026 •

edited

Loading

Dampfinchen commented Apr 11, 2026 •

edited

Loading