-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Summary
Claude eval runs are failing after roughly 2 minutes even though the target timeout is configured to 3600 seconds.
What I observed
This does not currently look like a normal AgentV timeout path.
Across two runs, the outcome was:
execution_status: execution_errorfailure_reason_code: provider_error- error message:
Claude Code process aborted by user
So the failure appears to be a Claude provider / Claude Code subprocess abort rather than AgentV enforcing the configured timeout.
Environment
- AgentV local checkout using the built-in
claudeprovider - Claude provider implementation is SDK-backed via
@anthropic-ai/claude-agent-sdk - That SDK then spawns a Claude Code subprocess underneath
- Claude Code installed locally:
claude 2.1.62 - Claude runtime reported in logs:
claude-opus-4-6 - AgentV target timeout:
3600seconds - Static workspace path used:
/tmp/agentv-cargowise-customs-static-claude
Target config
- name: claude
provider: claude
timeout_seconds: 3600
judge_target: llm
log_format: jsonReproduction
Command:
cd /home/christso/projects/WTG.AI.Prompts
export AGENTV_CARGOWISE_CUSTOMS_WORKSPACE_PATH=/tmp/agentv-cargowise-customs-static-claude
agentv eval evals/cargowise-customs/customs-validation-rules/customs-rule-codes-configuration.yaml \
--targets .agentv/targets.yaml \
--target claude \
--test-id add-rule-code-to-country \
--workers 1 \
--output /tmp/customs-add-rule-code-claude-static.jsonl \
--output-format jsonl \
--verboseReran the same command with:
--output /tmp/customs-add-rule-code-claude-static-rerun.jsonlBehavior
For both runs:
- static workspace setup succeeded
before_allcompleted successfully- Claude progressed normally through discovery for about 2 minutes
- AgentV then recorded:
Claude Code process aborted by user
This reproduced twice, so it does not currently look intermittent.
Artifacts
First run:
- Result JSONL:
/tmp/customs-add-rule-code-claude-static.jsonl
- AgentV result file:
/home/christso/projects/WTG.AI.Prompts/.agentv/results/eval_2026-03-10T11-00-07-414Z.jsonl
- Claude log:
/home/christso/projects/WTG.AI.Prompts/.agentv/logs/claude/2026-03-10T11-00-10-160Z_claude_add-rule-code-to-country_attempt-1_63681e6a.log
Second run:
- Result JSONL:
/tmp/customs-add-rule-code-claude-static-rerun.jsonl
- AgentV result file:
/home/christso/projects/WTG.AI.Prompts/.agentv/results/eval_2026-03-10T11-06-58-388Z.jsonl
- Claude log:
/home/christso/projects/WTG.AI.Prompts/.agentv/logs/claude/2026-03-10T11-07-00-987Z_claude_add-rule-code-to-country_attempt-1_cad57748.log
Additional clue from logs
Claude init event includes:
apiKeySource: "none"model: "claude-opus-4-6"
Likely next investigation
- investigate why the SDK-backed Claude provider is surfacing
Claude Code process aborted by user - compare behavior with a direct Claude CLI execution path to see whether the issue is in:
- AgentV Claude provider integration
- the Claude Agent SDK layer
- or Claude Code subprocess handling itself
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels