Radiation transport example by melo-gonzo · Pull Request #1647 · NVIDIA/physicsnemo

melo-gonzo · 2026-05-14T15:56:02Z

PhysicsNeMo Pull Request

Description

Radiation Transport Surrogate Model with Transolver

This PR adds a new PhysicsNeMo example under examples/nuclear_engineering/radiation_transport/ that trains a Transolver surrogate for the 2-D linear radiation transport equation on two benchmark problems relevant for nuclear reactor assembly design and inertial confinement fusion — the Lattice and Hohlraum benchmarks from Kusch et al. 2025, with data generated by the KiT-RT simulation code. The example is built end-to-end on PhysicsNeMo's: Mesh datapipes, DataLoader / Compose / Normalize transforms, the Transolver model, the CombinedOptimizer (Muon + AdamW), and physicsnemo.utils.checkpoint. It supports distributed training, well-defined quantities of interest, a differentiable physics loss, and reusable modules for extending to other archetectures.

Why

As with many other scientific and engineering pipelines, running simulations is the bottleneck. This workflow demonstrates how PhysicsNeMo, and models traditionally used in CFD, CAE, and other domains, can be reused for radiation transport. Because the KiT-RT code is "benchmark-style," it is a natural interface for validating scientific ML and surrogate models.

Key Changes

New example tree

examples/nuclear_engineering/radiation_transport/
├── README.md                 # walkthrough: science, install, dataset, training, eval
├── DATASET_CARD.md           # dataset card describing the .pmsh layout
└── src/                      # 12 Python modules, 8 YAML configs
    ├── train.py              # Hydra entry — composes case/data/model/train
    ├── inference.py          # Hydra-driven evaluation; writes metrics + figures
    ├── trainer.py            # training loop (DDP, AMP, gradient accumulation, warmup+cosine)
    ├── dataset.py            # `RTEBaseDataset` over a directory of `.pmsh/` stores
    ├── loader.py             # `TransolverAdapter`, `collate_no_padding`, `build_dataloaders`
    ├── transforms.py         # RTE-specific `Transform`s registered with the datapipes registry
    ├── losses.py             # region-weighted MSE + QoI physics loss
    ├── qoi.py                # differentiable QoI evaluators (final-time, T=1)
    ├── evaluation_metrics.py # field + QoI aggregators
    ├── checkpointing.py      # `best_model/` checkpointing, Muon + AdamW combo optimizer
    ├── compute_normalizations.py  # standalone CLI to produce flux / material stats YAMLs
    ├── viz.py                # 3-panel flux plot + per-region QoI scatter
    └── conf/                 # Hydra groups: case/, data/, model/, train/, inference/

Data layout

Each simulation is one <name>.pmsh/ directory (written by physicsnemo.mesh.Mesh.save) next to a <name>.attrs.json sidecar. RTEBaseDataset._load uses physicsnemo.mesh.Mesh.load for the memmap tensors and reads raw_attrs from the sidecar, exposing it as a NonTensorData metadata entry on the returned TensorDict. Splits are basename arrays; the reader appends .pmsh when opening stores.

Training

DDP-ready via torchrun --nproc_per_node=N src/train.py; single-process
works via plain python (no DDP-specific code is gated on launch
detection beyond what DistributedManager provides).
AMP via torch.amp.autocast + GradScaler (fp16) / direct autocast (bf16).
Optimizer is Adam by default; train.optimizer.type=muon returns a
CombinedOptimizer with torch.optim.Muon for 2-D weight matrices
- AdamW for everything else.
Scheduler is SequentialLR([LinearLR, CosineAnnealingLR]) for warmup
- cosine annealing.
Single best-by-val_loss checkpoint kept at checkpoints/best_model/.

Known Limitations

Hohlraum boundary input flux is present in the source data but is
not used as a model input in this example.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.
The CHANGELOG.md is up to date with these changes.
An issue is linked to this pull request.
If I am implementing a new model or modifying any existing model, I have followed the Models Implementation Coding Standards.

Dependencies

Review Process

All PRs are reviewed by the PhysicsNeMo team before merging.

Depending on which files are changed, GitHub may automatically assign a maintainer for review.

We are also testing AI-based code review tools (e.g., Greptile), which may add automated comments with a confidence score.
This score reflects the AI’s assessment of merge readiness and is not a qualitative judgment of your work, nor is
it an indication that the PR will be accepted / rejected.

AI-generated feedback should be reviewed critically for usefulness.
You are not required to respond to every AI comment, but they are intended to help both authors and reviewers.
Please react to Greptile comments with 👍 or 👎 to provide feedback on their accuracy.

…recipe

mnabian

LGTM! Left a few minor comments.

…ivate classes/methods

…-example # Conflicts: # CHANGELOG.md

loliverhennigh · 2026-05-22T21:33:00Z

Wait, I think the split file might not actually be used here.

RTEBaseDataset reads the split into self.filenames, but loading still goes through the reader by integer index, and the reader has its own sorted list of all .pmsh files. So unless I’m missing something, the split file controls the length of the dataset, but not which files get loaded.

Wouldn’t that mean train/val/test can end up reading overlapping files depending on the directory ordering?

loliverhennigh

LGTM

melo-gonzo · 2026-05-22T22:04:41Z

Wait, I think the split file might not actually be used here.

Great catch! Thanks for the attention to detail, fixed in 8c9c3e4.

melo-gonzo · 2026-05-23T01:25:36Z

/blossom-ci

melo-gonzo · 2026-05-26T14:44:38Z

/blossom-ci

coreyjadams · 2026-05-26T15:49:06Z

/blossom-ci

melo-gonzo · 2026-05-26T15:52:23Z

/ok to test cbb1ddc

melo-gonzo · 2026-05-26T16:50:37Z

/ok to test cbb1ddc

copy-pr-bot · 2026-05-26T16:50:41Z

/ok to test cbb1ddc

@melo-gonzo, there was an error processing your request: E2

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/2/

melo-gonzo · 2026-05-26T17:07:50Z

/ok to test 37e1413

melo-gonzo · 2026-05-26T19:44:40Z

/blossom-ci

melo-gonzo · 2026-05-27T15:37:57Z

/blossom-ci

melo-gonzo · 2026-05-27T16:26:01Z

/blossom-ci

melo-gonzo and others added 30 commits May 14, 2026 08:09

feat: consolidated radiation transport surrogate model training/eval …

f3530b6

…recipe

feat: update eval plots

5b7f762

feat: update checkpointing logic

fffa6d1

feat: checkpointing updates

9b17987

feat: checkpointing updates

10fd663

feat: checkpoint updates

3ddd5d7

feat: checkpoint logic

ba96ada

feat: skip logging nan/inf checkpoints

d16a857

feat: requiring split file

92ee232

feat: default to top_model for inference

055b80b

fix: eval sampler removing duplicates

4a86307

fix: strict checking for physics loss required keys

e8ec47f

fix: validatio metric accregation and checkpointing updates

f9910d6

fix: explicit padding to flux

e26e68e

feat: cleaning time dependent training references

01b41d7

docs: update readme

5d297ca

feat: purging some unused code

c32817d

feat: purging unused code

ef624d7

feat: purging unused code

63264c8

feat: some more refactoring and purging

619ea5b

feat: some more refactoring and purging

8a1f112

feat: a few fixes and cleaning up nits

7e6661e

fix: forcing single process inference

1d7d3a1

fix: pre-commit

1935ba8

refactor: porting to pmsh

c35d974

docs: refresh readme

efb7126

refactor: read geometry info from sidecar json

42b7ebb

fix: rename zarr to mesh

1c8911b

fix: inference flux stats required or infered from hydra

5309800

fix: purging zarr refs, update some npy->torch, remove legacy comments

b0ebc59

melo-gonzo added 3 commits May 18, 2026 13:20

fix: dist training logging bug

d2c91a8

fix: make muon default

683396d

feat: update dataset to work with updated pmsh

fc04d0e

mnabian reviewed May 21, 2026

View reviewed changes

Comment thread examples/nuclear_engineering/radiation_transport/README.md

mnabian approved these changes May 21, 2026

View reviewed changes

melo-gonzo added 3 commits May 21, 2026 08:11

fix: update comment about physics objective definition

bb5d5d8

docs: update readme, mesh store description updates

5bbd56e

docs: adding RTE images

73c6514

melo-gonzo commented May 21, 2026

View reviewed changes

Comment thread examples/nuclear_engineering/radiation_transport/README.md Outdated

root and others added 3 commits May 21, 2026 13:29

fix: hohlraum parameter wiring, updated to eliminate dependence on pr…

23c889e

…ivate classes/methods

Merge remote-tracking branch 'upstream/main' into radiation-transport…

6b4cf18

…-example # Conflicts: # CHANGELOG.md

fix: norm computation fix

084d54b

loliverhennigh approved these changes May 22, 2026

View reviewed changes

fix: deterministic loading of samples from file lists

8c9c3e4

Merge branch 'main' into radiation-transport-example

cbb1ddc

Merge branch 'main' into radiation-transport-example

37e1413

Merge branch 'main' into radiation-transport-example

75999cc

Conversation

melo-gonzo commented May 14, 2026

PhysicsNeMo Pull Request

Description

Radiation Transport Surrogate Model with Transolver

Why

Key Changes

New example tree

Data layout

Training

Known Limitations

Checklist

Dependencies

Review Process

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mnabian left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

loliverhennigh commented May 22, 2026

Uh oh!

loliverhennigh left a comment

Choose a reason for hiding this comment

Uh oh!

melo-gonzo commented May 22, 2026

Uh oh!

melo-gonzo commented May 23, 2026

Uh oh!

melo-gonzo commented May 26, 2026

Uh oh!

coreyjadams commented May 26, 2026

Uh oh!

melo-gonzo commented May 26, 2026

Uh oh!

melo-gonzo commented May 26, 2026

Uh oh!

copy-pr-bot Bot commented May 26, 2026

Uh oh!

melo-gonzo commented May 26, 2026

Uh oh!

melo-gonzo commented May 26, 2026

Uh oh!

melo-gonzo commented May 27, 2026

Uh oh!

melo-gonzo commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants