Qwen3 TAG_WITH_TAGGED tool format: p.json() fails on array<object> parameter values; partial tool_call leaked to client poisons multi-turn history

## Summary

When a tool has a parameter whose JSON schema is `array<object>` (e.g. `[{"type":"web"}]`) and the model (Qwen3 family using the `TAG_WITH_TAGGED` / `<tool_call><function=...><parameter=...>` format) emits it as the inner value of a `<parameter>` tag, the autoparser's PEG `p.json()` fails mid-stream. Two cascading problems follow:

1. **Root parse failure.** `build_tool_parser_tag_tagged` uses `p.json()` for any parameter that doesn't resolve to a string schema. On array-of-object values the PEG parser aborts, the request ends in a 500 with `Failed to parse input at pos N: <tool_call>...`.
2. **Partial state leaked to the client.** Before the failure, the partial-parse fallback at `common/chat.cpp:2158` returns whatever AST was built so far. For streaming (`is_partial=true`) this surfaces to the client as an SSE delta containing `tool_calls[0].function.arguments = "{"` (one character). The client, acting correctly per the OpenAI spec, appends this tool_call to conversation history.
3. **History poisoning.** Every subsequent request replays that history. `func_args_not_string` at `common/chat.cpp:1842-1860` eagerly parses `tool_calls[].function.arguments` as JSON and throws a 500 on the bare `{`, so the conversation is now stuck: every follow-up crashes with `Failed to parse tool call arguments as JSON: ... unexpected end of input`, even though the client did nothing wrong.

Net effect: a single tool-call attempt with a complex parameter permanently wedges the conversation until the client manually strips the assistant message from history.

## Name and Version

```
version: 8759 (2b2cd57de)
built with GNU 11.4.0 for Linux x86_64
```

HEAD of master as of 2026-04-11, i.e. PR #21216 is included (verified commit `26229755c` is in the binary).

## Operating systems

Linux

## GGML backends

CUDA

## Hardware

4x RTX 3090

## Models

`Qwen3.5-122B-A10B-Q4_K_M` (GGUF). The bug is not specific to this model — it reproduces with any model whose template routes to `TAG_WITH_TAGGED` tool format.

## Steps to reproduce

1. Launch the server with jinja enabled and any tool set containing an `array<object>` parameter:
   ```
   llama-server -m Qwen3.5-122B-A10B-Q4_K_M.gguf -ngl 99 -c 163840 -fa on \
                --host 0.0.0.0 --port 8080 --jinja
   ```
2. Send a chat completion with a tool definition like:
   ```json
   {
     "type": "function",
     "function": {
       "name": "firecrawl_search",
       "parameters": {
         "type": "object",
         "properties": {
           "query":   {"type": "string"},
           "limit":   {"type": "integer"},
           "sources": {
             "type": "array",
             "items": {
               "type": "object",
               "properties": {"type": {"type": "string", "enum": ["web","images","news"]}},
               "required": ["type"]
             }
           }
         },
         "required": ["query"]
       }
     }
   }
   ```
   and a user prompt that naturally requires calling it with `sources` set.
3. The model emits (captured via a request-dumping proxy in front of llama-server):
   ```
   <tool_call>
   <function=firecrawl_search>
   <parameter=limit>
   10
   </parameter>
   <parameter=query>
   pop culture trends USA March April 2026 movies music TV shows celebrities
   </parameter>
   <parameter=sources>
   [{"type": "web"},
   ```
4. Server responds with 500 in the SSE stream:
   ```
   Failed to parse input at pos 518: <tool_call>
   <function=firecrawl_search>
   ...
   <parameter=sources>
   [{"type": "web"},
   ```
5. Inspection of the SSE stream of that same request shows a **preceding** delta event:
   ```json
   {"choices":[{"delta":{"tool_calls":[{"index":0,"id":"...",
     "type":"function","function":{"name":"firecrawl_search","arguments":"{"}}]}}]}
   ```
   i.e. `arguments = "{"` (one character) was already pushed to the client before the stream fails.
6. Any follow-up chat completion that replays the assistant message as conversation history now fails with:
   ```
   Failed to parse tool call arguments as JSON:
   [json.exception.parse_error.101] parse error at line 1, column 2:
   syntax error while parsing object key - unexpected end of input; expected string literal
   ```

## Root cause (pointers)

`common/chat-auto-parser-generator.cpp` around line 367 in `build_tool_parser_tag_tagged`:

```cpp
(schema_info.resolves_to_string(param_schema) ?
    p.tool_arg_string_value(p.schema(until_suffix, ..., param_schema, true)) :
    p.tool_arg_json_value(p.schema(p.json(), ..., param_schema, false)) + p.space())
```

For non-string parameters the generated rule calls `p.json()`, which appears to be unable to parse JSON whose value is `array<object>` inside a `<parameter>` tag context. Possibly related: nested-object backtracking, or lookahead for the closing `</parameter>` interfering with the JSON grammar. I have not pinned the exact failure site inside `p.json()` yet — happy to dig further if useful.

The partial-parse fallback at `common/chat.cpp:2158-2178` is what surfaces `"{"` to the client during streaming, and `func_args_not_string` at `common/chat.cpp:1842-1860` is the secondary crash point that poisons the conversation.

## Suggested fixes

1. **Parser:** Either fix `p.json()` to parse arbitrary nested JSON inside `<parameter>` tag context, or (simpler) switch the non-string branch to use `until_suffix`-based capture + a post-parse `json::parse` on the raw slice. The latter matches how many real-world clients emit structured parameter values and avoids coupling grammar with schema.
2. **Streaming safety:** `is_partial` should not emit a `tool_calls[].function.arguments` delta unless the accumulated content is at least a syntactically valid JSON *prefix* that the client can safely append to (or, simplest, flush only when the full tool_call is complete). Emitting `"{"` and then aborting is strictly worse than emitting nothing.
3. **Replay tolerance:** `func_args_not_string` should not throw a 500 when a prior `tool_calls[].function.arguments` in the input history is not valid JSON. It can either leave the string as-is (letting the template render it verbatim) or wrap it for the template. Throwing forces the client to clean up after the server's own broken output.

## Notes

Not a duplicate of #20867 (that one was about `MAX_REPETITION_THRESHOLD`, fixed by #21216, which is already in this build).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen3 TAG_WITH_TAGGED tool format: p.json() fails on array<object> parameter values; partial tool_call leaked to client poisons multi-turn history #21771

Summary

Name and Version

Operating systems

GGML backends

Hardware

Models

Steps to reproduce

Root cause (pointers)

Suggested fixes

Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Qwen3 TAG_WITH_TAGGED tool format: p.json() fails on array<object> parameter values; partial tool_call leaked to client poisons multi-turn history #21771

Description

Summary

Name and Version

Operating systems

GGML backends

Hardware

Models

Steps to reproduce

Root cause (pointers)

Suggested fixes

Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions