Runtime artifacts are written under:
kernels/projects/<project>/io/individual_ops/kernels/projects/<project>/io/summary.jsonkernels/projects/<project>/benchmarks/op_benchmarks.jsonkernels/projects/<project>/benchmarks/torch_baseline_cache.jsonkernels/projects/<project>/state.json
The profile job prepares project inputs and baseline benchmark data.
generate runs per operator (sequentially):
- Generate kernel for operator
- Validate compile/correctness via backend success marker
- Optional optimize for the same operator
- Optional benchmark refresh
This enables incremental chart updates while the run is in progress.
Optimization is MCTS-driven per operator and writes tree artifacts under trees/<op>/.
Benchmark reads baseline data and optimized outputs to build op_benchmarks.json.
Canonical CLI orchestration is:
python -m src.optimizer.workflow <profile|generate|optimize|benchmark> ...Frontend orchestration is implemented in:
frontend/walkers/kernel_job_runners.jac
- State transitions are persisted in
state.json. - Stale process recovery avoids silent success.
- Chart status is explicit (
pending|error|empty|partial|ready). - Baseline benchmark cache is fingerprinted by runtime/device context.