Formalize test inference with enriched metrics and auto-analysis by forklady42 · Pull Request #106 · Quantum-Accelerators/electrai

forklady42 · 2026-03-27T18:56:46Z

Summary

Enrich metrics.csv from 3 columns (rank,index,nmae) to 10: adds loss, max_pred, max_target, mean_pred, mean_target, num_electrons, duration_ms — all computed per-sample over spatial dims
Flexible checkpoint resolution in test.py: checks ckpt_file > last.ckpt > best.ckpt > glob fallback, replacing the hardcoded last.ckpt
New summarize.py module: computes NMAE stats (mean/median/P95/P99/max), threshold counts, generates histogram + CDF plots, and optionally logs to W&B (image, table, histogram, scalar stats)
Auto-chain analysis after trainer.test(): summary + distribution plots always run; saturation and tail analysis run when applicable

Test plan

All 25 tests on main pass (uv run pytest)
Pre-commit (ruff lint + format) passes on all changed files
Run test inference on a checkpoint and verify metrics.csv has all 10 columns
Verify summary.txt and nmae_distribution.png are generated in log_dir
Verify analyze_saturation works on the enriched CSV (no more missing column errors)
Verify W&B logging works when wandb_mode: online

🤖 Generated with Claude Code

Enrich metrics.csv from 3 columns (rank, index, nmae) to 10 columns adding loss, max_pred, max_target, mean_pred, mean_target, num_electrons, and duration_ms. Add flexible checkpoint resolution (ckpt_file > last.ckpt > best.ckpt > glob fallback) and automatic post-test summary statistics with distribution plots. This unblocks analyze_saturation.py which already expected max_pred/max_target columns. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Compute max_pred, max_target, mean_pred, mean_target, and num_electrons per-sample by reducing over spatial dimensions only (keeping the batch dimension). Previously these were batch-level scalars that happened to be correct only with batch_size=1. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The glob fallback picks the latest epoch by lexicographic sort, not the lowest val_loss. Fix the docstring and comment to match. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

No need for a separate normmae_fn when the only loss function is NormMAE — both compute the same thing. Uses loss_fn for both the nmae and loss columns in metrics.csv (they'll be identical for now). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When wandb_mode != "disabled", log to W&B after test inference: - Distribution PNG as wandb.Image - Per-sample metrics as wandb.Table for interactive filtering - Native histogram for the overview panel - Scalar summary stats (mean, median, P95, P99, max) W&B is wired into the Trainer so Lightning's built-in test_loss metric also appears in the dashboard. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

analyze_metrics does not create its output directory. The test entrypoint now mkdir's saturation_dir before calling it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

forklady42 and others added 7 commits March 27, 2026 14:00

Fix _resolve_checkpoint docstring to match code behavior

c627820

The glob fallback picks the latest epoch by lexicographic sort, not the lowest val_loss. Fix the docstring and comment to match. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Simplify summarize() by replacing lines.append() with f-string sections

eb23c27

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fix saturation analysis: create output directory before writing plots

210fc17

analyze_metrics does not create its output directory. The test entrypoint now mkdir's saturation_dir before calling it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Formalize test inference with enriched metrics and auto-analysis#106

Formalize test inference with enriched metrics and auto-analysis#106
forklady42 wants to merge 7 commits intomainfrom
test/formalize-inference

forklady42 commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

forklady42 commented Mar 27, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant