fix(agents): correct prompt tool names and recover from tool errors (#175, #176)#177
Conversation
…175) Two coupled robustness fixes for the agent layer, both surfaced by a capture_run_messages diagnostic. The changes share base.py and experiment.py, so they land in one commit. #175 — the experiment prompt named tools as run_backtest / list_runs / compare_backtest_results, but the registered tools are tool_-prefixed (tool_run_backtest, ...). Weaker models trusted the prompt and called unknown tool names. TOOL_USAGE_INSTRUCTIONS and the EXPERIMENT_SYSTEM_PROMPT workflow now use the exact registered names. #176 — a tool raising a plain exception aborted the whole run (observed: ValueError "No data found for store=..."). New recoverable() decorator wraps every async DB-touching tool so an expected ValueError becomes a ModelRetry the model can correct from; other exceptions still propagate. - Add recoverable() to agents/base.py; decorate the 6 experiment tools and the 2 rag_assistant tools (tool_plain pure tools left alone). - Tests: prompt names use tool_* ; recoverable converts ValueError to ModelRetry, passes other exceptions through, is transparent on success.
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Reviewer's GuideAligns experiment agent prompt/tool names with the actual registered tool_* functions and introduces a recoverable() decorator so ValueError from async DB-touching tools becomes a ModelRetry instead of crashing the agent run, with tests covering both behaviors. Sequence diagram for agent tool call with recoverable wrappersequenceDiagram
participant Model
participant Agent
participant tool_run_backtest
Model->>Agent: Call tool_run_backtest
Agent->>tool_run_backtest: Await wrapped function
alt [ValueError raised]
tool_run_backtest-->>Agent: ValueError
Agent-->>Model: ModelRetry(str(exc))
else [No error]
tool_run_backtest-->>Agent: Result
Agent-->>Model: Result
end
Flow diagram for recoverable decorator behaviorflowchart TD
A[Call wrapped async tool] --> B[Execute func]
B --> C{ValueError raised?}
C -- No --> D[Return awaited result]
C -- Yes --> E["Raise ModelRetry with str(exc)"]
File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The
recoverabledecorator assumes the wrapped callable is always async andawait-able; consider either enforcing this via a runtime assertion/check or making the decorator robust to accidental use on sync functions to avoid confusingTypeError: object is not awaitablefailures. - Tool names are now duplicated across
TOOL_USAGE_INSTRUCTIONS, the experiment workflow prompt, and the actual@agent.toolfunctions; consider centralizing these names (e.g., as constants or deriving the prompt snippets from the registered tools) to reduce the risk of future drift like in #175.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `recoverable` decorator assumes the wrapped callable is always async and `await`-able; consider either enforcing this via a runtime assertion/check or making the decorator robust to accidental use on sync functions to avoid confusing `TypeError: object is not awaitable` failures.
- Tool names are now duplicated across `TOOL_USAGE_INSTRUCTIONS`, the experiment workflow prompt, and the actual `@agent.tool` functions; consider centralizing these names (e.g., as constants or deriving the prompt snippets from the registered tools) to reduce the risk of future drift like in #175.
## Individual Comments
### Comment 1
<location path="app/features/agents/agents/base.py" line_range="24-26" />
<code_context>
logger = structlog.get_logger()
+def recoverable[**P, ToolReturnT](
+ func: Callable[P, Awaitable[ToolReturnT]],
+) -> Callable[P, Awaitable[ToolReturnT]]:
+ """Wrap an async agent tool so an expected ``ValueError`` becomes a ``ModelRetry``.
+
</code_context>
<issue_to_address>
**issue (bug_risk):** The generic decorator signature uses invalid syntax and will not parse.
`def recoverable[**P, ToolReturnT](` is not valid Python syntax, even with PEP 695. To keep this decorator generic and compatible with current Python and type checkers, use the ParamSpec/TypeVar pattern instead:
```python
P = ParamSpec("P")
ToolReturnT = TypeVar("ToolReturnT")
def recoverable(
func: Callable[P, Awaitable[ToolReturnT]],
) -> Callable[P, Awaitable[ToolReturnT]]:
...
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| def recoverable[**P, ToolReturnT]( | ||
| func: Callable[P, Awaitable[ToolReturnT]], | ||
| ) -> Callable[P, Awaitable[ToolReturnT]]: |
There was a problem hiding this comment.
issue (bug_risk): The generic decorator signature uses invalid syntax and will not parse.
def recoverable[**P, ToolReturnT]( is not valid Python syntax, even with PEP 695. To keep this decorator generic and compatible with current Python and type checkers, use the ParamSpec/TypeVar pattern instead:
P = ParamSpec("P")
ToolReturnT = TypeVar("ToolReturnT")
def recoverable(
func: Callable[P, Awaitable[ToolReturnT]],
) -> Callable[P, Awaitable[ToolReturnT]]:
...Resolves the test_base.py conflict against dev (which now carries the #170/#172/#173 agent work) and folds in PR #177 code-review fixes: - base.py: recoverable() now rejects non-coroutine functions at decoration time with a clear TypeError, instead of failing opaquely with 'not awaitable' on first tool call (review: overall comment 1). - test_base.py: replace the hardcoded prompt-tool-name list with test_prompts_only_reference_registered_tool_names, which reads the registered tool set off the built agent and asserts every tool_* name in the prompts is real — drift in either direction now fails CI, the actual guard #175 needed (review: overall comment 2). - Kept both sides of the merge: the recoverable + prompt-name tests from this branch and the retry-budget + PromptedOutput tests from dev.
Summary
Closes #175 and #176. Both were found by a
capture_run_messagesdiagnostic of a failing experiment-agent run, and both touchbase.py+experiment.py, so they land together.#175 — prompt tool names didn't match registered tools
TOOL_USAGE_INSTRUCTIONSand theEXPERIMENT_SYSTEM_PROMPTworkflow named tools asrun_backtest,list_runs,compare_backtest_results, … but the registered tools aretool_-prefixed. Weaker/local models trusted the prompt text and called unknown tool names (Unknown tool name: 'run_backtest'). The prompts now use the exact registeredtool_*names.#176 — a tool exception crashed the whole run
A tool raising a plain exception aborted the run (observed:
ValueError: No data found for store=1, product=101 …when the model picked a store/product with no data). Newrecoverable()decorator wraps every async DB-touching tool so an expectedValueErrorbecomes aModelRetry— the model gets the message and can correct its arguments. Other exception types still propagate as genuine bugs.Changes
agents/base.py— newrecoverable()decorator; correctedTOOL_USAGE_INSTRUCTIONS.agents/experiment.py— corrected workflow tool names;@recoverableon the 6 DB tools.agents/rag_assistant.py—@recoverableon the 2 DB tools (tool_plainpure tools untouched).tests/test_base.py— prompt-name regression test;recoverablebehaviour tests.Verification
ruff· ✅mypy --strict· ✅pyright(0 errors) · ✅ 113 agent unit testsSummary by Sourcery
Align agent prompts with registered tool names and make database-driven agent tools resilient to expected data errors so runs can recover instead of crashing.
Bug Fixes:
tool_*-prefixed names used for registered tools.ValueErrorfrom async agent tools intoModelRetryerrors so input-driven data issues no longer abort an entire agent run.Enhancements:
recoverabledecorator and apply it to database-backed tools in experiment and RAG assistant agents to standardize error handling behavior.Tests:
recoverabledecorator’s behavior for success,ValueError, and other exceptions.