Skip to content

Feature request: retry execution errors in eval runs #522

@christso

Description

@christso

Summary

AgentV should support configurable retries when a test run ends in an execution error, especially for transient provider/runtime failures.

Problem

Today, execution errors such as provider aborts, flaky subprocess failures, temporary network issues, or intermittent runtime crashes fail the test immediately.

That makes it hard to distinguish:

  • deterministic product failures
  • transient infrastructure / provider failures

This is especially painful for agent providers that wrap external CLIs or long-running subprocesses.

Proposed behavior

Add a retry policy for execution errors at the eval level and optionally per test/target.

Examples:

execution:
  retries:
    execution_errors: 2
    retry_on:
      - provider_error
      - timeout

Or target-level override:

targets:
  - name: claude
    provider: claude
    retry_execution_errors: 2

Requested semantics

  • only retry when execution_status == execution_error
  • optionally filter by failure_reason_code
  • preserve attempt logs and result metadata for every attempt
  • report final outcome plus retry history
  • stop retrying after the configured limit

Why this matters

This would make eval runs more robust against intermittent provider/runtime failures and reduce false negatives when the agent never actually reached a stable scored result.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions