Skip to content

fix(parsing): accept bare-string args for union schemas with string#63

Merged
hallerite merged 1 commit into
mainfrom
fix/coerce-arg-anyof-string
May 26, 2026
Merged

fix(parsing): accept bare-string args for union schemas with string#63
hallerite merged 1 commit into
mainfrom
fix/coerce-arg-anyof-string

Conversation

@eligotts
Copy link
Copy Markdown
Contributor

@eligotts eligotts commented May 25, 2026

Summary

Fix _coerce_arg_value to recognize anyOf / oneOf schemas with a string branch, so XML-style parsers (Qwen3.5, Qwen3.6, GLM, MiniMax, Laguna) don't wrongly flag bare-string args as INVALID_JSON.

The bug

form_input and similar tools have parameters typed as Union[str, bool] (or Optional[str]), which Pydantic serialises to:

{"anyOf": [{"type": "string"}, {"type": "boolean"}]}

The top-level "type" key is absent. So _coerce_arg_value:

  1. Skipped the pure-string short-circuit (no top-level "string" type).
  2. Tried json.loads("LIS")JSONDecodeError.
  3. Returned (text, True) — flagging the fallback as malformed-JSON.

parse_qwen35 then OR's used_fallback across all params into any_json_fallback, and stamps the call ToolCallParseStatus.INVALID_JSON. Downstream, verifiers/clients/renderer_client.py silently dropped all non-OK tool calls (lines 604-609 on main), so any form_input(value="LIS", ...) was effectively erased before reaching the env. The env saw no tool_calls and terminated the rollout via the no_tools_called stop predicate.

Empirically (mini-browse-apps multi-turn eval, Qwen3.6-35B-A3B, n=20):

  • Renderer-route reward pre-fix: 0.10
  • Renderer-route reward with this fix + the matching verifiers filter relaxation: 0.55, basically at MITO parity (/v1/chat/completions got 0.60 on the same slice; 4 of MITO's failures were unrelated vLLM mm-cache crashes)

The fix

Detect union (anyOf / oneOf) branches containing "string" and treat the json.loads-fails → text-fallback path as expected behavior under those schemas, not malformed. That matches the function's docstring intent ("the string branch wins as fallback") which the original code didn't actually enforce.

The pure-string short-circuit (type: "string" and type: ["string"]) is unchanged.

Test plan

Extended tests/test_tool_arg_type_preservation.py with one new parametrized test test_union_with_string_emits_ok_status covering three schema shapes:

  • anyOf: [string, boolean]
  • anyOf: [string, null]
  • oneOf: [string, integer]

Each case round-trips a bare-string arg through the existing XML-style parsers (Qwen3.5, GLM-5, MiniMax-M2.5, Laguna-XS.2) plus the two JSON-format controls (Qwen3, Kimi K2.5), and asserts both status == OK and value preservation. All 18 parametrized cases pass.

The matching verifiers change (stop silently dropping non-OK ParsedToolCall in renderer_client.py) is a separate PR — together they restore full multi-turn parity with the /v1/chat/completions route.

Note

Fix _coerce_arg_value to accept bare strings for union schemas containing a string branch

When a parameter schema uses anyOf or oneOf and includes a string type, bare string arguments failed JSON parsing and were incorrectly marked with used_json_fallback=True, causing them to be treated as malformed input.

  • parsing.py now detects union schemas with a string branch and returns the raw string with used_json_fallback=False on JSON parse failure.
  • Tests in test_tool_arg_type_preservation.py cover anyOf and oneOf unions with string plus boolean/null/integer branches, asserting OK status and unchanged argument values.

Macroscope summarized 9c30dc0.


Note by @hallerite:

A few factual additions from a reviewer pass:

  • Nemotron3 also gets this fix. It routes through parse_qwen35 (see renderers/nemotron3.py:33), so the affected set is Qwen3.5, Qwen3.6, Nemotron3, GLM-4.5/5, Laguna-XS.2, MiniMax-M2.
  • JSON-format parsers don't need this fix and are unaffected: parse_qwen3 (Qwen3, Qwen3-VL), parse_deepseek_v3, parse_kimi_k2 / parse_kimi_k2_section, parse_gpt_oss — they json.loads the entire args block, so unions work natively.
  • vLLM divergence (intentional). vLLM's qwen3xml_tool_parser._get_param_type (vllm/tool_parsers/qwen3xml_tool_parser.py:993-1009) defaults missing type to "string" and never json.loads for that branch — never flags this case, but also can't recover int/bool from a Union[str, int]. This PR still tries json.loads first and only suppresses the fallback flag when a string branch exists.
  • Schema-shape coverage gaps (likely theoretical, worth noting):
    • Detector only checks anyOf/oneOf one level deep — nested unions won't be detected. Pydantic v2 flattens these.
    • JSON Schema's array-type form {"type": ["string", "boolean"]} isn't handled; existing short-circuit only catches single-element ["string"]. Pydantic doesn't emit this shape.
    • param_schema.get("anyOf") or param_schema.get("oneOf") or [] short-circuits — a schema with both keys would ignore oneOf.

Note

Medium Risk
Changes shared argument coercion used by multiple XML tool parsers; behavior is narrower (fewer false INVALID_JSON) but affects how non-OK tool calls are classified downstream.

Overview
Fixes _coerce_arg_value in renderers/parsing.py so XML-style tool parsers no longer mark bare-string argument values as malformed JSON when the parameter schema is a union that includes string (anyOf / oneOf), not only when the top-level type is "string".

After json.loads fails, the helper now scans union branches for a string type and sets used_json_fallback only when a string outcome is not allowed. That stops parsers (Qwen3.5-style XML, GLM, MiniMax, Laguna, etc.) from aggregating a false INVALID_JSON status on otherwise valid calls—e.g. Pydantic str | bool tools where values like "LIS" are emitted verbatim in <arg_value> tags.

tests/test_tool_arg_type_preservation.py adds test_union_with_string_emits_ok_status (three union shapes × existing renderer matrix) asserting ToolCallParseStatus.OK and preserved argument values for a bare string.

Reviewed by Cursor Bugbot for commit 9c30dc0. Bugbot is set up for automated code reviews on this repo. Configure here.

For schemas like ``anyOf: [{"type": "string"}, {"type": "boolean"}]``
the top-level ``type`` key is absent, so ``_coerce_arg_value`` fell
through the pure-string short-circuit and tried ``json.loads`` on the
bare text. The text-fallback path was correct (right value), but the
function flagged ``used_json_fallback=True`` — causing
``parse_qwen35`` to stamp ``ToolCallParseStatus.INVALID_JSON``.
Downstream, ``verifiers/clients/renderer_client.py`` silently dropped
all non-OK tool calls, so any tool with a ``str | bool`` (or similar)
parameter where the model emitted a string was effectively erased —
the env saw no ``tool_calls`` and ended the rollout via the
``no_tools_called`` stop predicate.

Detect ``anyOf``/``oneOf`` branches containing ``"string"`` and treat
the json-failure → text fallback as expected (not flagged) under those
schemas. Matches the docstring's stated intent — "the string branch
wins as fallback" — which the original implementation didn't enforce.

Tests: extend ``test_tool_arg_type_preservation.py`` with a
parametrized case set for union-with-string schemas, asserting both
``status == OK`` and value preservation across the existing XML-style
parsers.
@hallerite hallerite marked this pull request as ready for review May 26, 2026 13:54
@macroscopeapp
Copy link
Copy Markdown

macroscopeapp Bot commented May 26, 2026

Approvability

Verdict: Approved

This is a straightforward bug fix that corrects false INVALID_JSON flags when parsing tool arguments with union schemas (anyOf/oneOf) containing string types. The change is small, well-documented, and includes comprehensive tests validating the fix.

You can customize Macroscope's approvability policy. Learn more.

@hallerite hallerite merged commit 6473354 into main May 26, 2026
11 checks passed
@hallerite hallerite deleted the fix/coerce-arg-anyof-string branch May 26, 2026 13:59
hallerite added a commit that referenced this pull request May 26, 2026
Folds in #63's bare-string-for-union-schemas fix and refactors
_coerce_arg_value to use vLLM's current shared helpers
(extract_types_from_schema + coerce_to_schema_type from
vllm/tool_parsers/utils.py, landed in vLLM #43025 on 2026-05-19),
replacing this branch's earlier single-type ladder.

Why: #52 was originally written against pre-refactor vLLM and would
have regressed #63 (anyOf schemas routed to "object" → json.loads
on bare strings flagged INVALID_JSON). vLLM's new shape recursively
flattens anyOf/oneOf/allOf into a type set and walks a priority-
ordered ladder (null > integer > number > boolean > object > array
> string) where string is the always-succeeding terminal — which
absorbs #63's case as a side effect.

Behavior changes relative to the pre-merge branch:
- Union schemas (anyOf/oneOf/allOf) recursively flatten — Union[str,X]
  now accepts bare strings without flagging (the #63 win).
- No top-level "null" short-circuit. Null coercion only fires when
  "null" is in the schema's type set (Optional[X] → anyOf [X, null]
  or type: ["X", "null"]). String-typed "null" stays a string.
- "yes" / non-boolean text for bool-typed param returns raw text +
  INVALID flag (was: False + INVALID).
- Number branch demotes whole floats to int regardless of source
  shape (1e3 → 1, 1.0 → 1) — matches vLLM's val.is_integer() rule.
- No ast.literal_eval anywhere — vLLM doesn't use it. Python-literal
  dicts ('k':1) no longer parse for object params.
- "binary" dropped as a bool alias (not in vLLM's _TYPE_ALIASES).

Renderers-specific deviation kept: used_fallback flag returned
alongside the value, propagated to ToolCallParseStatus.INVALID_JSON
for the verifier / RL-loss signal. vLLM has no such signal.

Tests updated for new behavior; full suite green (1775+99=1874 passed).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants