fix(agents): handle model tool-retry crash gracefully (#164) by w7-mgfcode · Pull Request #165 · w7-mgfcode/ForecastLabAI

w7-mgfcode · 2026-05-18T11:39:13Z

Summary

Closes #164.

Sending a casual message (e.g. Hello) to the Experiment agent could crash the WebSocket stream with a raw internal error:

Error: Stream error: Tool 'tool_compare_backtest_results' exceeded max retries count of 1

The configured model hallucinated a tool_compare_backtest_results call with malformed args; Pydantic rejected them, PydanticAI retried once, failed again, and raised UnexpectedModelBehavior — which surfaced unfiltered to the user.

Changes

Graceful error handling — stream_chat / chat now catch UnexpectedModelBehavior and emit a clean, recoverable error event / friendly message instead of leaking the raw exception string. This is a general safety net for any misbehaving tool call, not just the one reported.
Hardened tool_compare_backtest_results — its two params now default to None; a missing/empty arg returns a self-correcting hint dict rather than failing schema validation and burning the retry budget.
System prompt — added a conversational-fallback instruction so greetings / no-objective messages are answered without invoking workflow tools.

Tests

test_chat_model_misbehavior_returns_friendly_message — chat() returns a clean message, no raw error leak.
test_stream_chat_model_misbehavior_yields_error_event — stream_chat() yields a single recoverable error event.

Validation

✅ ruff check + ruff format --check
✅ mypy --strict (25 files, no issues)
✅ pyright (0 errors)
✅ pytest -m "not integration" — 107 agent unit tests pass

Summary by Sourcery

Handle misbehaving model tool calls in agents by returning user-friendly errors in chat and stream flows and hardening experiment backtest comparison behavior.

Bug Fixes:

Prevent agent chat requests from crashing when the model triggers UnexpectedModelBehavior by returning a clean, generic error message instead of the raw exception.
Prevent streaming chat from failing the WebSocket when the model misbehaves by emitting a single recoverable error event with a sanitized message.
Avoid tool_compare_backtest_results causing schema validation failures by tolerating missing arguments and returning an informative error response instead.

Enhancements:

Clarify the experiment agent system prompt with conversational fallback behavior for greetings or messages without a concrete forecasting objective.
Add regression tests covering friendly error handling for both chat and streaming chat when the model produces invalid tool calls.

Tests:

Add tests ensuring chat() returns a friendly message and does not leak raw UnexpectedModelBehavior errors.
Add tests ensuring stream_chat() yields a single recoverable error event instead of crashing on model misbehavior.

A casual message to the Experiment agent could crash the WebSocket stream with a raw 'Tool ... exceeded max retries count of 1' error when the model produced an invalid tool call. - Catch PydanticAI's UnexpectedModelBehavior in stream_chat and chat; surface a clean, recoverable error event / message instead of leaking the internal exception string. - Make tool_compare_backtest_results tolerant of missing/empty args (return a self-correcting hint) so a malformed call no longer burns the retry budget and crashes the run. - Add a conversational-fallback line to the experiment system prompt so greetings are answered without invoking workflow tools. - Add regression tests for both the chat and stream-chat paths.

sourcery-ai · 2026-05-18T11:39:20Z

Reviewer's Guide

Adds defensive handling for misbehaving model tool calls in chat/stream_chat, hardens the experiment agent’s backtest comparison tool, and updates the system prompt to keep casual greetings from invoking tools, with regression tests for both sync and streaming flows.

Sequence diagram for stream_chat model misbehavior handling

sequenceDiagram
    actor User
    participant WebSocketHandler
    participant AgentsService
    participant Model

    User->>WebSocketHandler: send_message
    WebSocketHandler->>AgentsService: stream_chat(session_id, message)
    AgentsService->>Model: run_async
    Model-->>AgentsService: UnexpectedModelBehavior
    AgentsService->>AgentsService: logger.warning(agents.stream_chat_model_misbehavior)
    AgentsService-->>WebSocketHandler: StreamEvent(error, recoverable=True)
    WebSocketHandler-->>User: error event on WebSocket
    AgentsService-->>WebSocketHandler: return

File-Level Changes

Change	Details	Files
Handle UnexpectedModelBehavior in chat() by returning a friendly, recoverable response instead of propagating the raw exception.	Wrap agent.run in chat() with an UnexpectedModelBehavior except block. Log a warning including session_id, error string, and error_type when the model misbehaves. Update session.last_activity and flush the DB before returning. Return a ChatResponse with a generic invalid tool call message that avoids leaking the original exception text.	`app/features/agents/service.py`
Handle UnexpectedModelBehavior in stream_chat() by emitting a structured recoverable error event.	Wrap agent.run_stream in stream_chat() with an UnexpectedModelBehavior except block. Log a warning with session_id, error string, and error_type when the streaming run misbehaves. Yield a single StreamEvent of type 'error' with a user-friendly message, error_type 'model_behavior_error', and recoverable=True. Terminate stream_chat early after emitting the error event to avoid further iteration.	`app/features/agents/service.py`
Relax and self-heal tool_compare_backtest_results to avoid retry-budget crashes from missing or malformed arguments.	Change tool_compare_backtest_results signature to accept result_a and result_b as optional dicts with None defaults. Document that both arguments should be full backtest-result dicts from tool_run_backtest. Add a guard that detects missing/empty arguments and returns an error/hint dict instead of raising schema validation errors. Delegate to compare_backtest_results only when both inputs are present.	`app/features/agents/agents/experiment.py`
Guide conversational behavior of the experiment agent so casual greetings don’t invoke tools prematurely.	Augment the experiment agent system prompt with a CONVERSATIONAL BEHAVIOR section. Instruct the agent to respond in the summary field and ask for a concrete objective when greeted or given vague input. Explicitly forbid tool usage until a specific forecasting objective or request is provided.	`app/features/agents/agents/experiment.py`
Add regression tests ensuring misbehaving model/tool calls yield friendly messages and recoverable error events.	Add test_chat_model_misbehavior_returns_friendly_message to assert chat() masks UnexpectedModelBehavior with a clean message and no raw error leak. Introduce TestAgentServiceStreamChat test class for streaming chat behavior tests. Add test_stream_chat_model_misbehavior_yields_error_event using a custom async context manager that raises UnexpectedModelBehavior on entry to simulate run_stream failures. Assert the streaming API yields exactly one error event with recoverable=True, a stable error_type, and no raw exception text.	`app/features/agents/tests/test_service.py`

Possibly linked issues

Agent chat crashes with raw 'exceeded max retries' on a misbehaving tool call #164: They match exactly: the PR adds graceful UnexpectedModelBehavior handling, hardens tool_compare_backtest_results, and updates the system prompt.

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

coderabbitai · 2026-05-18T11:39:35Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 353d7872-8627-4081-b47b-1782ed36e1d1

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/agents-tool-retry-crash

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

sourcery-ai

Hey - I've left some high level feedback:

The UnexpectedModelBehavior handling in chat and stream_chat is very similar but duplicated; consider extracting a small helper (e.g., to build the user-facing message / error payload) so the behavior and wording stay consistent and easier to adjust later.
In tool_compare_backtest_results, returning an error dict when result_a/result_b are missing may propagate into normal comparison handling; if downstream code expects a specific schema, consider tagging this as a distinct error type/shape or short-circuiting earlier to avoid mixing error payloads with real comparison results.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The `UnexpectedModelBehavior` handling in `chat` and `stream_chat` is very similar but duplicated; consider extracting a small helper (e.g., to build the user-facing message / error payload) so the behavior and wording stay consistent and easier to adjust later.
- In `tool_compare_backtest_results`, returning an error dict when `result_a`/`result_b` are missing may propagate into normal comparison handling; if downstream code expects a specific schema, consider tagging this as a distinct error type/shape or short-circuiting earlier to avoid mixing error payloads with real comparison results.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai Bot reviewed May 18, 2026

View reviewed changes

w7-mgfcode merged commit 864a2e1 into dev May 18, 2026
8 checks passed

w7-mgfcode mentioned this pull request May 18, 2026

feat: cut v0.2.12 — agent hardening, AI model console, demo showcase #178

Merged

w7-mgfcode deleted the fix/agents-tool-retry-crash branch May 18, 2026 14:20

w7-mgfcode mentioned this pull request May 18, 2026

Agent chat crashes with raw 'exceeded max retries' on a misbehaving tool call #164

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(agents): handle model tool-retry crash gracefully (#164)#165

fix(agents): handle model tool-retry crash gracefully (#164)#165
w7-mgfcode merged 1 commit into
devfrom
fix/agents-tool-retry-crash

w7-mgfcode commented May 18, 2026 •

edited by sourcery-ai Bot

Loading

Uh oh!

sourcery-ai Bot commented May 18, 2026 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

coderabbitai Bot commented May 18, 2026

Review skipped

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

w7-mgfcode commented May 18, 2026 • edited by sourcery-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Tests

Validation

Summary by Sourcery

Uh oh!

sourcery-ai Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for stream_chat model misbehavior handling

File-Level Changes

Possibly linked issues

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

coderabbitai Bot commented May 18, 2026

Review skipped

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

w7-mgfcode commented May 18, 2026 •

edited by sourcery-ai Bot

Loading

sourcery-ai Bot commented May 18, 2026 •

edited

Loading