Skip to content

data: complete 10-task pilot eval — planning +10% pass rate, -19% tokens

f8e7cdf
Select commit
Loading
Failed to load commit list.
Sign in for the full log view
Open

Add deliberate-eval: framework for measuring planning impact on agent outcomes #11

data: complete 10-task pilot eval — planning +10% pass rate, -19% tokens
f8e7cdf
Select commit
Loading
Failed to load commit list.

Annotations

1 warning
test (3.10)
succeeded Apr 4, 2026 in 19s