Skip to content

fix: prevent InlineSchema leak into record content + defensive guard#631

Merged
Muizzkolapo merged 8 commits into
mainfrom
fix/inline-schema-stored-as-content
May 24, 2026
Merged

fix: prevent InlineSchema leak into record content + defensive guard#631
Muizzkolapo merged 8 commits into
mainfrom
fix/inline-schema-stored-as-content

Conversation

@Muizzkolapo
Copy link
Copy Markdown
Owner

@Muizzkolapo Muizzkolapo commented May 24, 2026

Summary

Root cause prevention + multi-layer defense for a production bug where LLMs echo JSON Schema definitions back as response content, causing RecordContextError crashes in downstream actions.

Prevention (root cause)

  • _extract_ollama_schema now strips title from the format param sent to client.chat(). The title: "InlineSchema" key leaked framework metadata into LLM context and triggered schema-echo behavior. Ollama's format param only needs structural keys (type, properties, required, additionalProperties).

Detection (defense in depth)

  • Online path: _reject_schema_echo_items in _validate_llm_output_schema — runs unconditionally, replaces echoes with _parse_error dicts so reprompt retries
  • Batch processing: Schema-echo check in _process_successful_result — prevents corrupted content from reaching records
  • Batch reprompt: detect_parse_error now calls is_schema_echo — batch reprompt retries schema echoes on the same cycle

Guard (already-corrupted records)

  • Zero-overlap guard in scope_builder.py — if a dependency namespace has zero field overlap with declared observe fields, wraps as SKIPPED_NAMESPACE. Catches any content corruption generically, not just InlineSchema.

Changes

File Change
agent_actions/llm/providers/ollama/client.py Strip title from format param in _extract_ollama_schema
agent_actions/utils/schema_echo.py Shared is_schema_echo() + make_schema_echo_error()
agent_actions/processing/helpers.py _reject_schema_echo_items in _validate_llm_output_schema
agent_actions/llm/batch/processing/batch_result_strategy.py Schema-echo check in _process_successful_result
agent_actions/processing/evaluation/strategies/validation.py Schema-echo detection in detect_parse_error
agent_actions/prompt/context/scope_builder.py Zero-overlap guard in DependencyNamespaceBuilder.build
5 test files 36 new tests

Test plan

  • 36 new tests (prevention, detection, reprompt integration, zero-overlap guard)
  • 7343 total tests pass, 2 skipped (pre-existing)
  • ruff check + format clean
  • Manual: run inline-schema workflow on Ollama to confirm no echo

🤖 Generated with Claude Code

Muizzkolapo and others added 8 commits May 24, 2026 11:26
…paths

When an LLM echoes the JSON Schema definition back as its response
(e.g. {"title": "InlineSchema", "type": "object", "properties": {...}})
instead of conforming data, the echoed schema was stored as record
content, causing RecordContextError crashes in downstream actions.

Add _is_schema_echo() detection and _reject_schema_echo_items() filter
in helpers.py (online path) and batch_result_strategy.py (batch path).
Schema-echo responses are replaced with _parse_error dicts so reprompt
can retry. Detection runs unconditionally, not gated by
skip_schema_validation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add safety-net detection in DependencyNamespaceBuilder.build() for
corrupted namespaces that contain compiled JSON Schema definitions
instead of actual action output. Corrupted namespaces are wrapped as
SKIPPED_NAMESPACE with a warning, preventing RecordContextError crashes
in downstream observe resolution. Guards already-corrupted records in
the database.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Move is_schema_echo() and make_schema_echo_error() to
  agent_actions/utils/schema_echo.py to avoid circular imports and
  enable reuse across helpers.py, batch_result_strategy.py, and
  scope_builder.py
- scope_builder.py calls shared is_schema_echo() instead of inlining
  the detection logic (prevents drift)
- Use json.dumps() instead of str() for raw_response serialization
- Avoid list allocation in happy path (_reject_schema_echo_items
  scans first, copies only when an echo is found)
- Deduplicate _parse_error dict construction via make_schema_echo_error()

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The defensive guard in scope_builder.py now checks whether the namespace
has any overlap with declared observe fields instead of string-matching
title == "InlineSchema". This catches all forms of content corruption
(schema-echo from any schema name, garbage data, wrong action output)
generically.

Guard fires when allowed_fields is a non-empty list and
set(dep_data.keys()) & set(allowed_fields) is empty. Wildcard observe
(action.*) sets allowed_fields=None and bypasses the guard.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
detect_parse_error() now checks for schema-echo content (via
is_schema_echo) so the batch reprompt loop retries on the same cycle
instead of only catching the echo during post-processing. This closes
the gap where batch schema echoes were converted to error records but
never triggered a retry.

Also simplified batch test assertions for clarity.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The compiled Ollama schema includes "title": "InlineSchema" (from
vendor_compilation.py) which leaks framework metadata into the LLM
context. Ollama's format parameter only uses structural keys (type,
properties, required, additionalProperties) — title is not a
structural constraint and can trigger schema-echo behavior where the
model returns the schema definition itself instead of conforming data.

_extract_ollama_schema now strips title before passing the schema to
client.chat(format=...). This prevents the root cause rather than just
detecting the symptom.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
MessageBuilder._strip_schema_metadata() removes name, title, and
description from schemas before injecting them into prompt text. These
keys are framework labels (e.g. "InlineSchema") that leak
implementation details into LLM context and can trigger schema-echo.

Applied at all three prompt-injection points:
- SchemaInjection.PROMPT (Ollama Cloud online + batch)
- SchemaInjection.INLINE_FULL (unused but protected)
- SchemaInjection.INLINE_FULL_LIST (Gemini)

API-parameter paths (OpenAI, Anthropic, Groq) are unaffected — those
vendors require name/title for structured output enforcement and their
API-level constraints prevent echoing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Avoids recreating the set on every _strip_schema_metadata call.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Muizzkolapo Muizzkolapo merged commit 1938035 into main May 24, 2026
5 checks passed
@github-actions github-actions Bot locked and limited conversation to collaborators May 24, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant