fix: prevent InlineSchema leak into record content + defensive guard#631
Merged
Conversation
…paths
When an LLM echoes the JSON Schema definition back as its response
(e.g. {"title": "InlineSchema", "type": "object", "properties": {...}})
instead of conforming data, the echoed schema was stored as record
content, causing RecordContextError crashes in downstream actions.
Add _is_schema_echo() detection and _reject_schema_echo_items() filter
in helpers.py (online path) and batch_result_strategy.py (batch path).
Schema-echo responses are replaced with _parse_error dicts so reprompt
can retry. Detection runs unconditionally, not gated by
skip_schema_validation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add safety-net detection in DependencyNamespaceBuilder.build() for corrupted namespaces that contain compiled JSON Schema definitions instead of actual action output. Corrupted namespaces are wrapped as SKIPPED_NAMESPACE with a warning, preventing RecordContextError crashes in downstream observe resolution. Guards already-corrupted records in the database. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Move is_schema_echo() and make_schema_echo_error() to agent_actions/utils/schema_echo.py to avoid circular imports and enable reuse across helpers.py, batch_result_strategy.py, and scope_builder.py - scope_builder.py calls shared is_schema_echo() instead of inlining the detection logic (prevents drift) - Use json.dumps() instead of str() for raw_response serialization - Avoid list allocation in happy path (_reject_schema_echo_items scans first, copies only when an echo is found) - Deduplicate _parse_error dict construction via make_schema_echo_error() Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The defensive guard in scope_builder.py now checks whether the namespace has any overlap with declared observe fields instead of string-matching title == "InlineSchema". This catches all forms of content corruption (schema-echo from any schema name, garbage data, wrong action output) generically. Guard fires when allowed_fields is a non-empty list and set(dep_data.keys()) & set(allowed_fields) is empty. Wildcard observe (action.*) sets allowed_fields=None and bypasses the guard. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
detect_parse_error() now checks for schema-echo content (via is_schema_echo) so the batch reprompt loop retries on the same cycle instead of only catching the echo during post-processing. This closes the gap where batch schema echoes were converted to error records but never triggered a retry. Also simplified batch test assertions for clarity. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The compiled Ollama schema includes "title": "InlineSchema" (from vendor_compilation.py) which leaks framework metadata into the LLM context. Ollama's format parameter only uses structural keys (type, properties, required, additionalProperties) — title is not a structural constraint and can trigger schema-echo behavior where the model returns the schema definition itself instead of conforming data. _extract_ollama_schema now strips title before passing the schema to client.chat(format=...). This prevents the root cause rather than just detecting the symptom. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
MessageBuilder._strip_schema_metadata() removes name, title, and description from schemas before injecting them into prompt text. These keys are framework labels (e.g. "InlineSchema") that leak implementation details into LLM context and can trigger schema-echo. Applied at all three prompt-injection points: - SchemaInjection.PROMPT (Ollama Cloud online + batch) - SchemaInjection.INLINE_FULL (unused but protected) - SchemaInjection.INLINE_FULL_LIST (Gemini) API-parameter paths (OpenAI, Anthropic, Groq) are unaffected — those vendors require name/title for structured output enforcement and their API-level constraints prevent echoing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Avoids recreating the set on every _strip_schema_metadata call. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Root cause prevention + multi-layer defense for a production bug where LLMs echo JSON Schema definitions back as response content, causing
RecordContextErrorcrashes in downstream actions.Prevention (root cause)
_extract_ollama_schemanow stripstitlefrom the format param sent toclient.chat(). Thetitle: "InlineSchema"key leaked framework metadata into LLM context and triggered schema-echo behavior. Ollama's format param only needs structural keys (type,properties,required,additionalProperties).Detection (defense in depth)
_reject_schema_echo_itemsin_validate_llm_output_schema— runs unconditionally, replaces echoes with_parse_errordicts so reprompt retries_process_successful_result— prevents corrupted content from reaching recordsdetect_parse_errornow callsis_schema_echo— batch reprompt retries schema echoes on the same cycleGuard (already-corrupted records)
scope_builder.py— if a dependency namespace has zero field overlap with declared observe fields, wraps asSKIPPED_NAMESPACE. Catches any content corruption generically, not just InlineSchema.Changes
agent_actions/llm/providers/ollama/client.pytitlefrom format param in_extract_ollama_schemaagent_actions/utils/schema_echo.pyis_schema_echo()+make_schema_echo_error()agent_actions/processing/helpers.py_reject_schema_echo_itemsin_validate_llm_output_schemaagent_actions/llm/batch/processing/batch_result_strategy.py_process_successful_resultagent_actions/processing/evaluation/strategies/validation.pydetect_parse_erroragent_actions/prompt/context/scope_builder.pyDependencyNamespaceBuilder.buildTest plan
🤖 Generated with Claude Code