Trace-Bench Documentation

Trace-Bench is a benchmarking framework for evaluating LLM-based optimization algorithms built on OpenTrace. It provides a reproducible harness that pairs tasks (benchmark problems) with trainers (optimization algorithms), runs them across seeds, and produces structured artifacts for comparison.

How to Use This Documentation

Start with Overview to learn the core concepts and how a run is structured. If you want to execute experiments, read Running Experiments and Config Reference next. For UI workflows, jump to UI Guide. For extension points, follow Adding a Task, Adding an Agent, Adding a Trainer, and Adding a Benchmark.

Validation Evidence

See Validation Evidence for the exact commands and transcripts used to validate the repository after the layout changes.

Quick Start

# Trace-Bench requires Trace/OpenTrace (`opto`) to be installed.
# Option A: pip install trace-opt
# Option B: editable sibling checkout: pip install -e ../OpenTrace

pip install -e .
trace-bench list-tasks
trace-bench run --config configs/smoke.yaml --runs-dir runs

See the root README for full install options.

Page	Description
Overview	What Trace-Bench is, concept glossary, and pointers to intro notebooks
Running Experiments	CLI reference, fair comparisons, reading results
Agents and Tasks	Technical distinction between agents/models and tasks
Adding an Agent	How to optimize a new agent with Trace-Bench
Adding a Task	How to contribute a new benchmark task
Adding a Trainer	How trainers are discovered and registered
Adding a Benchmark	How to integrate an external benchmark suite
UI Guide	Gradio UI tabs, workflows, and screenshots
MLflow Integration	Enabling MLflow, what is logged
Task Inventory	All available tasks by suite (LLM4AD, VeriBench, examples, internal)
Config Reference	YAML schema, matrix expansion, resume modes, output artifacts
Result Analysis	Reading results via CLI, UI, and Python

Notebooks

Notebook	Topic
`notebooks/01_quick_start.ipynb`	First run in under 5 minutes
`notebooks/02_api_walkthrough.ipynb`	Python API and config objects
`notebooks/03_task_coverage.ipynb`	Exploring available tasks
`notebooks/04_gradio_ui.ipynb`	Interactive results dashboard
`notebooks/05_full_benchmark.ipynb`	Running a full benchmark matrix
`notebooks/06_multiobjective_convex.ipynb`	Multi-objective optimization (convex)
`notebooks/07_multiobjective_bbeh.ipynb`	Multi-objective optimization (BBEH)
`notebooks/08_multiobjective_gsm8k.ipynb`	Multi-objective optimization (GSM8K)

Project Layout

trace_bench/       Python package (CLI, runner, registry, config)
benchmarks/            Benchmark suites (LLM4AD, KernelBench, Veribench)
configs/               YAML run configurations
notebooks/             Jupyter notebooks with worked examples
runs/                  Default output directory (created on first run)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trace-Bench Documentation

How to Use This Documentation

Validation Evidence

Quick Start

Table of Contents

Notebooks

Project Layout

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Trace-Bench Documentation

How to Use This Documentation

Validation Evidence

Quick Start

Table of Contents

Notebooks

Project Layout