Formalize test inference with enriched metrics and auto-analysis#106
Draft
forklady42 wants to merge 7 commits intomainfrom
Draft
Formalize test inference with enriched metrics and auto-analysis#106forklady42 wants to merge 7 commits intomainfrom
forklady42 wants to merge 7 commits intomainfrom
Conversation
Enrich metrics.csv from 3 columns (rank, index, nmae) to 10 columns adding loss, max_pred, max_target, mean_pred, mean_target, num_electrons, and duration_ms. Add flexible checkpoint resolution (ckpt_file > last.ckpt > best.ckpt > glob fallback) and automatic post-test summary statistics with distribution plots. This unblocks analyze_saturation.py which already expected max_pred/max_target columns. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Compute max_pred, max_target, mean_pred, mean_target, and num_electrons per-sample by reducing over spatial dimensions only (keeping the batch dimension). Previously these were batch-level scalars that happened to be correct only with batch_size=1. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The glob fallback picks the latest epoch by lexicographic sort, not the lowest val_loss. Fix the docstring and comment to match. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
No need for a separate normmae_fn when the only loss function is NormMAE — both compute the same thing. Uses loss_fn for both the nmae and loss columns in metrics.csv (they'll be identical for now). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When wandb_mode != "disabled", log to W&B after test inference: - Distribution PNG as wandb.Image - Per-sample metrics as wandb.Table for interactive filtering - Native histogram for the overview panel - Scalar summary stats (mean, median, P95, P99, max) W&B is wired into the Trainer so Lightning's built-in test_loss metric also appears in the dashboard. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
analyze_metrics does not create its output directory. The test entrypoint now mkdir's saturation_dir before calling it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
metrics.csvfrom 3 columns (rank,index,nmae) to 10: addsloss,max_pred,max_target,mean_pred,mean_target,num_electrons,duration_ms— all computed per-sample over spatial dimstest.py: checksckpt_file>last.ckpt>best.ckpt> glob fallback, replacing the hardcodedlast.ckptsummarize.pymodule: computes NMAE stats (mean/median/P95/P99/max), threshold counts, generates histogram + CDF plots, and optionally logs to W&B (image, table, histogram, scalar stats)trainer.test(): summary + distribution plots always run; saturation and tail analysis run when applicableTest plan
mainpass (uv run pytest)metrics.csvhas all 10 columnssummary.txtandnmae_distribution.pngare generated inlog_diranalyze_saturationworks on the enriched CSV (no more missing column errors)wandb_mode: online🤖 Generated with Claude Code