Summary
When a tool has a parameter whose JSON schema is array<object> (e.g. [{"type":"web"}]) and the model (Qwen3 family using the TAG_WITH_TAGGED / <tool_call><function=...><parameter=...> format) emits it as the inner value of a <parameter> tag, the autoparser's PEG p.json() fails mid-stream. Two cascading problems follow:
- Root parse failure.
build_tool_parser_tag_tagged uses p.json() for any parameter that doesn't resolve to a string schema. On array-of-object values the PEG parser aborts, the request ends in a 500 with Failed to parse input at pos N: <tool_call>....
- Partial state leaked to the client. Before the failure, the partial-parse fallback at
common/chat.cpp:2158 returns whatever AST was built so far. For streaming (is_partial=true) this surfaces to the client as an SSE delta containing tool_calls[0].function.arguments = "{" (one character). The client, acting correctly per the OpenAI spec, appends this tool_call to conversation history.
- History poisoning. Every subsequent request replays that history.
func_args_not_string at common/chat.cpp:1842-1860 eagerly parses tool_calls[].function.arguments as JSON and throws a 500 on the bare {, so the conversation is now stuck: every follow-up crashes with Failed to parse tool call arguments as JSON: ... unexpected end of input, even though the client did nothing wrong.
Net effect: a single tool-call attempt with a complex parameter permanently wedges the conversation until the client manually strips the assistant message from history.
Name and Version
version: 8759 (2b2cd57de)
built with GNU 11.4.0 for Linux x86_64
HEAD of master as of 2026-04-11, i.e. PR #21216 is included (verified commit 26229755c is in the binary).
Operating systems
Linux
GGML backends
CUDA
Hardware
4x RTX 3090
Models
Qwen3.5-122B-A10B-Q4_K_M (GGUF). The bug is not specific to this model — it reproduces with any model whose template routes to TAG_WITH_TAGGED tool format.
Steps to reproduce
- Launch the server with jinja enabled and any tool set containing an
array<object> parameter:
llama-server -m Qwen3.5-122B-A10B-Q4_K_M.gguf -ngl 99 -c 163840 -fa on \
--host 0.0.0.0 --port 8080 --jinja
- Send a chat completion with a tool definition like:
{
"type": "function",
"function": {
"name": "firecrawl_search",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string"},
"limit": {"type": "integer"},
"sources": {
"type": "array",
"items": {
"type": "object",
"properties": {"type": {"type": "string", "enum": ["web","images","news"]}},
"required": ["type"]
}
}
},
"required": ["query"]
}
}
}
and a user prompt that naturally requires calling it with sources set.
- The model emits (captured via a request-dumping proxy in front of llama-server):
<tool_call>
<function=firecrawl_search>
<parameter=limit>
10
</parameter>
<parameter=query>
pop culture trends USA March April 2026 movies music TV shows celebrities
</parameter>
<parameter=sources>
[{"type": "web"},
- Server responds with 500 in the SSE stream:
Failed to parse input at pos 518: <tool_call>
<function=firecrawl_search>
...
<parameter=sources>
[{"type": "web"},
- Inspection of the SSE stream of that same request shows a preceding delta event:
{"choices":[{"delta":{"tool_calls":[{"index":0,"id":"...",
"type":"function","function":{"name":"firecrawl_search","arguments":"{"}}]}}]}
i.e. arguments = "{" (one character) was already pushed to the client before the stream fails.
- Any follow-up chat completion that replays the assistant message as conversation history now fails with:
Failed to parse tool call arguments as JSON:
[json.exception.parse_error.101] parse error at line 1, column 2:
syntax error while parsing object key - unexpected end of input; expected string literal
Root cause (pointers)
common/chat-auto-parser-generator.cpp around line 367 in build_tool_parser_tag_tagged:
(schema_info.resolves_to_string(param_schema) ?
p.tool_arg_string_value(p.schema(until_suffix, ..., param_schema, true)) :
p.tool_arg_json_value(p.schema(p.json(), ..., param_schema, false)) + p.space())
For non-string parameters the generated rule calls p.json(), which appears to be unable to parse JSON whose value is array<object> inside a <parameter> tag context. Possibly related: nested-object backtracking, or lookahead for the closing </parameter> interfering with the JSON grammar. I have not pinned the exact failure site inside p.json() yet — happy to dig further if useful.
The partial-parse fallback at common/chat.cpp:2158-2178 is what surfaces "{" to the client during streaming, and func_args_not_string at common/chat.cpp:1842-1860 is the secondary crash point that poisons the conversation.
Suggested fixes
- Parser: Either fix
p.json() to parse arbitrary nested JSON inside <parameter> tag context, or (simpler) switch the non-string branch to use until_suffix-based capture + a post-parse json::parse on the raw slice. The latter matches how many real-world clients emit structured parameter values and avoids coupling grammar with schema.
- Streaming safety:
is_partial should not emit a tool_calls[].function.arguments delta unless the accumulated content is at least a syntactically valid JSON prefix that the client can safely append to (or, simplest, flush only when the full tool_call is complete). Emitting "{" and then aborting is strictly worse than emitting nothing.
- Replay tolerance:
func_args_not_string should not throw a 500 when a prior tool_calls[].function.arguments in the input history is not valid JSON. It can either leave the string as-is (letting the template render it verbatim) or wrap it for the template. Throwing forces the client to clean up after the server's own broken output.
Notes
Not a duplicate of #20867 (that one was about MAX_REPETITION_THRESHOLD, fixed by #21216, which is already in this build).
Summary
When a tool has a parameter whose JSON schema is
array<object>(e.g.[{"type":"web"}]) and the model (Qwen3 family using theTAG_WITH_TAGGED/<tool_call><function=...><parameter=...>format) emits it as the inner value of a<parameter>tag, the autoparser's PEGp.json()fails mid-stream. Two cascading problems follow:build_tool_parser_tag_taggedusesp.json()for any parameter that doesn't resolve to a string schema. On array-of-object values the PEG parser aborts, the request ends in a 500 withFailed to parse input at pos N: <tool_call>....common/chat.cpp:2158returns whatever AST was built so far. For streaming (is_partial=true) this surfaces to the client as an SSE delta containingtool_calls[0].function.arguments = "{"(one character). The client, acting correctly per the OpenAI spec, appends this tool_call to conversation history.func_args_not_stringatcommon/chat.cpp:1842-1860eagerly parsestool_calls[].function.argumentsas JSON and throws a 500 on the bare{, so the conversation is now stuck: every follow-up crashes withFailed to parse tool call arguments as JSON: ... unexpected end of input, even though the client did nothing wrong.Net effect: a single tool-call attempt with a complex parameter permanently wedges the conversation until the client manually strips the assistant message from history.
Name and Version
HEAD of master as of 2026-04-11, i.e. PR #21216 is included (verified commit
26229755cis in the binary).Operating systems
Linux
GGML backends
CUDA
Hardware
4x RTX 3090
Models
Qwen3.5-122B-A10B-Q4_K_M(GGUF). The bug is not specific to this model — it reproduces with any model whose template routes toTAG_WITH_TAGGEDtool format.Steps to reproduce
array<object>parameter:{ "type": "function", "function": { "name": "firecrawl_search", "parameters": { "type": "object", "properties": { "query": {"type": "string"}, "limit": {"type": "integer"}, "sources": { "type": "array", "items": { "type": "object", "properties": {"type": {"type": "string", "enum": ["web","images","news"]}}, "required": ["type"] } } }, "required": ["query"] } } }sourcesset.{"choices":[{"delta":{"tool_calls":[{"index":0,"id":"...", "type":"function","function":{"name":"firecrawl_search","arguments":"{"}}]}}]}arguments = "{"(one character) was already pushed to the client before the stream fails.Root cause (pointers)
common/chat-auto-parser-generator.cpparound line 367 inbuild_tool_parser_tag_tagged:(schema_info.resolves_to_string(param_schema) ? p.tool_arg_string_value(p.schema(until_suffix, ..., param_schema, true)) : p.tool_arg_json_value(p.schema(p.json(), ..., param_schema, false)) + p.space())For non-string parameters the generated rule calls
p.json(), which appears to be unable to parse JSON whose value isarray<object>inside a<parameter>tag context. Possibly related: nested-object backtracking, or lookahead for the closing</parameter>interfering with the JSON grammar. I have not pinned the exact failure site insidep.json()yet — happy to dig further if useful.The partial-parse fallback at
common/chat.cpp:2158-2178is what surfaces"{"to the client during streaming, andfunc_args_not_stringatcommon/chat.cpp:1842-1860is the secondary crash point that poisons the conversation.Suggested fixes
p.json()to parse arbitrary nested JSON inside<parameter>tag context, or (simpler) switch the non-string branch to useuntil_suffix-based capture + a post-parsejson::parseon the raw slice. The latter matches how many real-world clients emit structured parameter values and avoids coupling grammar with schema.is_partialshould not emit atool_calls[].function.argumentsdelta unless the accumulated content is at least a syntactically valid JSON prefix that the client can safely append to (or, simplest, flush only when the full tool_call is complete). Emitting"{"and then aborting is strictly worse than emitting nothing.func_args_not_stringshould not throw a 500 when a priortool_calls[].function.argumentsin the input history is not valid JSON. It can either leave the string as-is (letting the template render it verbatim) or wrap it for the template. Throwing forces the client to clean up after the server's own broken output.Notes
Not a duplicate of #20867 (that one was about
MAX_REPETITION_THRESHOLD, fixed by #21216, which is already in this build).