fix(agents): handle model tool-retry crash gracefully (#164)#165
Conversation
A casual message to the Experiment agent could crash the WebSocket stream with a raw 'Tool ... exceeded max retries count of 1' error when the model produced an invalid tool call. - Catch PydanticAI's UnexpectedModelBehavior in stream_chat and chat; surface a clean, recoverable error event / message instead of leaking the internal exception string. - Make tool_compare_backtest_results tolerant of missing/empty args (return a self-correcting hint) so a malformed call no longer burns the retry budget and crashes the run. - Add a conversational-fallback line to the experiment system prompt so greetings are answered without invoking workflow tools. - Add regression tests for both the chat and stream-chat paths.
Reviewer's GuideAdds defensive handling for misbehaving model tool calls in chat/stream_chat, hardens the experiment agent’s backtest comparison tool, and updates the system prompt to keep casual greetings from invoking tools, with regression tests for both sync and streaming flows. Sequence diagram for stream_chat model misbehavior handlingsequenceDiagram
actor User
participant WebSocketHandler
participant AgentsService
participant Model
User->>WebSocketHandler: send_message
WebSocketHandler->>AgentsService: stream_chat(session_id, message)
AgentsService->>Model: run_async
Model-->>AgentsService: UnexpectedModelBehavior
AgentsService->>AgentsService: logger.warning(agents.stream_chat_model_misbehavior)
AgentsService-->>WebSocketHandler: StreamEvent(error, recoverable=True)
WebSocketHandler-->>User: error event on WebSocket
AgentsService-->>WebSocketHandler: return
File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- The
UnexpectedModelBehaviorhandling inchatandstream_chatis very similar but duplicated; consider extracting a small helper (e.g., to build the user-facing message / error payload) so the behavior and wording stay consistent and easier to adjust later. - In
tool_compare_backtest_results, returning an error dict whenresult_a/result_bare missing may propagate into normal comparison handling; if downstream code expects a specific schema, consider tagging this as a distinct error type/shape or short-circuiting earlier to avoid mixing error payloads with real comparison results.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `UnexpectedModelBehavior` handling in `chat` and `stream_chat` is very similar but duplicated; consider extracting a small helper (e.g., to build the user-facing message / error payload) so the behavior and wording stay consistent and easier to adjust later.
- In `tool_compare_backtest_results`, returning an error dict when `result_a`/`result_b` are missing may propagate into normal comparison handling; if downstream code expects a specific schema, consider tagging this as a distinct error type/shape or short-circuiting earlier to avoid mixing error payloads with real comparison results.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Summary
Closes #164.
Sending a casual message (e.g.
Hello) to the Experiment agent could crash the WebSocket stream with a raw internal error:The configured model hallucinated a
tool_compare_backtest_resultscall with malformed args; Pydantic rejected them, PydanticAI retried once, failed again, and raisedUnexpectedModelBehavior— which surfaced unfiltered to the user.Changes
stream_chat/chatnow catchUnexpectedModelBehaviorand emit a clean, recoverableerrorevent / friendly message instead of leaking the raw exception string. This is a general safety net for any misbehaving tool call, not just the one reported.tool_compare_backtest_results— its two params now default toNone; a missing/empty arg returns a self-correcting hint dict rather than failing schema validation and burning the retry budget.Tests
test_chat_model_misbehavior_returns_friendly_message—chat()returns a clean message, no raw error leak.test_stream_chat_model_misbehavior_yields_error_event—stream_chat()yields a single recoverableerrorevent.Validation
ruff check+ruff format --checkmypy --strict(25 files, no issues)pyright(0 errors)pytest -m "not integration"— 107 agent unit tests passSummary by Sourcery
Handle misbehaving model tool calls in agents by returning user-friendly errors in chat and stream flows and hardening experiment backtest comparison behavior.
Bug Fixes:
Enhancements:
Tests: