Skip to content

Qwen3 TAG_WITH_TAGGED tool format: p.json() fails on array<object> parameter values; partial tool_call leaked to client poisons multi-turn history #21771

@alesha-pro

Description

@alesha-pro

Summary

When a tool has a parameter whose JSON schema is array<object> (e.g. [{"type":"web"}]) and the model (Qwen3 family using the TAG_WITH_TAGGED / <tool_call><function=...><parameter=...> format) emits it as the inner value of a <parameter> tag, the autoparser's PEG p.json() fails mid-stream. Two cascading problems follow:

  1. Root parse failure. build_tool_parser_tag_tagged uses p.json() for any parameter that doesn't resolve to a string schema. On array-of-object values the PEG parser aborts, the request ends in a 500 with Failed to parse input at pos N: <tool_call>....
  2. Partial state leaked to the client. Before the failure, the partial-parse fallback at common/chat.cpp:2158 returns whatever AST was built so far. For streaming (is_partial=true) this surfaces to the client as an SSE delta containing tool_calls[0].function.arguments = "{" (one character). The client, acting correctly per the OpenAI spec, appends this tool_call to conversation history.
  3. History poisoning. Every subsequent request replays that history. func_args_not_string at common/chat.cpp:1842-1860 eagerly parses tool_calls[].function.arguments as JSON and throws a 500 on the bare {, so the conversation is now stuck: every follow-up crashes with Failed to parse tool call arguments as JSON: ... unexpected end of input, even though the client did nothing wrong.

Net effect: a single tool-call attempt with a complex parameter permanently wedges the conversation until the client manually strips the assistant message from history.

Name and Version

version: 8759 (2b2cd57de)
built with GNU 11.4.0 for Linux x86_64

HEAD of master as of 2026-04-11, i.e. PR #21216 is included (verified commit 26229755c is in the binary).

Operating systems

Linux

GGML backends

CUDA

Hardware

4x RTX 3090

Models

Qwen3.5-122B-A10B-Q4_K_M (GGUF). The bug is not specific to this model — it reproduces with any model whose template routes to TAG_WITH_TAGGED tool format.

Steps to reproduce

  1. Launch the server with jinja enabled and any tool set containing an array<object> parameter:
    llama-server -m Qwen3.5-122B-A10B-Q4_K_M.gguf -ngl 99 -c 163840 -fa on \
                 --host 0.0.0.0 --port 8080 --jinja
    
  2. Send a chat completion with a tool definition like:
    {
      "type": "function",
      "function": {
        "name": "firecrawl_search",
        "parameters": {
          "type": "object",
          "properties": {
            "query":   {"type": "string"},
            "limit":   {"type": "integer"},
            "sources": {
              "type": "array",
              "items": {
                "type": "object",
                "properties": {"type": {"type": "string", "enum": ["web","images","news"]}},
                "required": ["type"]
              }
            }
          },
          "required": ["query"]
        }
      }
    }
    and a user prompt that naturally requires calling it with sources set.
  3. The model emits (captured via a request-dumping proxy in front of llama-server):
    <tool_call>
    <function=firecrawl_search>
    <parameter=limit>
    10
    </parameter>
    <parameter=query>
    pop culture trends USA March April 2026 movies music TV shows celebrities
    </parameter>
    <parameter=sources>
    [{"type": "web"},
    
  4. Server responds with 500 in the SSE stream:
    Failed to parse input at pos 518: <tool_call>
    <function=firecrawl_search>
    ...
    <parameter=sources>
    [{"type": "web"},
    
  5. Inspection of the SSE stream of that same request shows a preceding delta event:
    {"choices":[{"delta":{"tool_calls":[{"index":0,"id":"...",
      "type":"function","function":{"name":"firecrawl_search","arguments":"{"}}]}}]}
    i.e. arguments = "{" (one character) was already pushed to the client before the stream fails.
  6. Any follow-up chat completion that replays the assistant message as conversation history now fails with:
    Failed to parse tool call arguments as JSON:
    [json.exception.parse_error.101] parse error at line 1, column 2:
    syntax error while parsing object key - unexpected end of input; expected string literal
    

Root cause (pointers)

common/chat-auto-parser-generator.cpp around line 367 in build_tool_parser_tag_tagged:

(schema_info.resolves_to_string(param_schema) ?
    p.tool_arg_string_value(p.schema(until_suffix, ..., param_schema, true)) :
    p.tool_arg_json_value(p.schema(p.json(), ..., param_schema, false)) + p.space())

For non-string parameters the generated rule calls p.json(), which appears to be unable to parse JSON whose value is array<object> inside a <parameter> tag context. Possibly related: nested-object backtracking, or lookahead for the closing </parameter> interfering with the JSON grammar. I have not pinned the exact failure site inside p.json() yet — happy to dig further if useful.

The partial-parse fallback at common/chat.cpp:2158-2178 is what surfaces "{" to the client during streaming, and func_args_not_string at common/chat.cpp:1842-1860 is the secondary crash point that poisons the conversation.

Suggested fixes

  1. Parser: Either fix p.json() to parse arbitrary nested JSON inside <parameter> tag context, or (simpler) switch the non-string branch to use until_suffix-based capture + a post-parse json::parse on the raw slice. The latter matches how many real-world clients emit structured parameter values and avoids coupling grammar with schema.
  2. Streaming safety: is_partial should not emit a tool_calls[].function.arguments delta unless the accumulated content is at least a syntactically valid JSON prefix that the client can safely append to (or, simplest, flush only when the full tool_call is complete). Emitting "{" and then aborting is strictly worse than emitting nothing.
  3. Replay tolerance: func_args_not_string should not throw a 500 when a prior tool_calls[].function.arguments in the input history is not valid JSON. It can either leave the string as-is (letting the template render it verbatim) or wrap it for the template. Throwing forces the client to clean up after the server's own broken output.

Notes

Not a duplicate of #20867 (that one was about MAX_REPETITION_THRESHOLD, fixed by #21216, which is already in this build).

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingchat parserIssues related to the chat parser and chat templatesregressionA regression introduced in a new build (something that was previously working correctly)

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions