Skip to content

Radiation transport example#1647

Open
melo-gonzo wants to merge 72 commits into
NVIDIA:mainfrom
melo-gonzo:radiation-transport-example
Open

Radiation transport example#1647
melo-gonzo wants to merge 72 commits into
NVIDIA:mainfrom
melo-gonzo:radiation-transport-example

Conversation

@melo-gonzo
Copy link
Copy Markdown
Collaborator

PhysicsNeMo Pull Request

Description

Radiation Transport Surrogate Model with Transolver

This PR adds a new PhysicsNeMo example under examples/nuclear_engineering/radiation_transport/ that trains a Transolver surrogate for the 2-D linear radiation transport equation on two benchmark problems relevant for nuclear reactor assembly design and inertial confinement fusion — the Lattice and Hohlraum benchmarks from Kusch et al. 2025, with data generated by the KiT-RT simulation code. The example is built end-to-end on PhysicsNeMo's: Mesh datapipes, DataLoader / Compose / Normalize transforms, the Transolver model, the CombinedOptimizer (Muon + AdamW), and physicsnemo.utils.checkpoint. It supports distributed training, well-defined quantities of interest, a differentiable physics loss, and reusable modules for extending to other archetectures.

Why

As with many other scientific and engineering pipelines, running simulations is the bottleneck. This workflow demonstrates how PhysicsNeMo, and models traditionally used in CFD, CAE, and other domains, can be reused for radiation transport. Because the KiT-RT code is "benchmark-style," it is a natural interface for validating scientific ML and surrogate models.

Key Changes

New example tree

examples/nuclear_engineering/radiation_transport/
├── README.md                 # walkthrough: science, install, dataset, training, eval
├── DATASET_CARD.md           # dataset card describing the .pmsh layout
└── src/                      # 12 Python modules, 8 YAML configs
    ├── train.py              # Hydra entry — composes case/data/model/train
    ├── inference.py          # Hydra-driven evaluation; writes metrics + figures
    ├── trainer.py            # training loop (DDP, AMP, gradient accumulation, warmup+cosine)
    ├── dataset.py            # `RTEBaseDataset` over a directory of `.pmsh/` stores
    ├── loader.py             # `TransolverAdapter`, `collate_no_padding`, `build_dataloaders`
    ├── transforms.py         # RTE-specific `Transform`s registered with the datapipes registry
    ├── losses.py             # region-weighted MSE + QoI physics loss
    ├── qoi.py                # differentiable QoI evaluators (final-time, T=1)
    ├── evaluation_metrics.py # field + QoI aggregators
    ├── checkpointing.py      # `best_model/` checkpointing, Muon + AdamW combo optimizer
    ├── compute_normalizations.py  # standalone CLI to produce flux / material stats YAMLs
    ├── viz.py                # 3-panel flux plot + per-region QoI scatter
    └── conf/                 # Hydra groups: case/, data/, model/, train/, inference/

Data layout

Each simulation is one <name>.pmsh/ directory (written by physicsnemo.mesh.Mesh.save) next to a <name>.attrs.json sidecar. RTEBaseDataset._load uses physicsnemo.mesh.Mesh.load for the memmap tensors and reads raw_attrs from the sidecar, exposing it as a NonTensorData metadata entry on the returned TensorDict. Splits are basename arrays; the reader appends .pmsh when opening stores.

Training

  • DDP-ready via torchrun --nproc_per_node=N src/train.py; single-process
    works via plain python (no DDP-specific code is gated on launch
    detection beyond what DistributedManager provides).
  • AMP via torch.amp.autocast + GradScaler (fp16) / direct autocast (bf16).
  • Optimizer is Adam by default; train.optimizer.type=muon returns a
    CombinedOptimizer with torch.optim.Muon for 2-D weight matrices
    • AdamW for everything else.
  • Scheduler is SequentialLR([LinearLR, CosineAnnealingLR]) for warmup
    • cosine annealing.
  • Single best-by-val_loss checkpoint kept at checkpoints/best_model/.

Known Limitations

  • Hohlraum boundary input flux is present in the source data but is
    not used as a model input in this example.

Checklist

Dependencies

Review Process

All PRs are reviewed by the PhysicsNeMo team before merging.

Depending on which files are changed, GitHub may automatically assign a maintainer for review.

We are also testing AI-based code review tools (e.g., Greptile), which may add automated comments with a confidence score.
This score reflects the AI’s assessment of merge readiness and is not a qualitative judgment of your work, nor is
it an indication that the PR will be accepted / rejected.

AI-generated feedback should be reviewed critically for usefulness.
You are not required to respond to every AI comment, but they are intended to help both authors and reviewers.
Please react to Greptile comments with 👍 or 👎 to provide feedback on their accuracy.

Comment thread examples/nuclear_engineering/radiation_transport/src/conf/train/base.yaml Outdated
Comment thread examples/nuclear_engineering/radiation_transport/src/dataset.py Outdated
Comment thread examples/nuclear_engineering/radiation_transport/README.md Outdated
Comment thread examples/nuclear_engineering/radiation_transport/README.md
Comment thread examples/nuclear_engineering/radiation_transport/README.md
Copy link
Copy Markdown
Collaborator

@mnabian mnabian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Left a few minor comments.

Comment thread examples/nuclear_engineering/radiation_transport/README.md Outdated
@loliverhennigh
Copy link
Copy Markdown
Collaborator

Wait, I think the split file might not actually be used here.

RTEBaseDataset reads the split into self.filenames, but loading still goes through the reader by integer index, and the reader has its own sorted list of all .pmsh files. So unless I’m missing something, the split file controls the length of the dataset, but not which files get loaded.

Wouldn’t that mean train/val/test can end up reading overlapping files depending on the directory ordering?

Copy link
Copy Markdown
Collaborator

@loliverhennigh loliverhennigh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@melo-gonzo
Copy link
Copy Markdown
Collaborator Author

Wait, I think the split file might not actually be used here.

Great catch! Thanks for the attention to detail, fixed in 8c9c3e4.

@melo-gonzo
Copy link
Copy Markdown
Collaborator Author

/blossom-ci

@melo-gonzo
Copy link
Copy Markdown
Collaborator Author

/blossom-ci

1 similar comment
@coreyjadams
Copy link
Copy Markdown
Collaborator

/blossom-ci

@melo-gonzo
Copy link
Copy Markdown
Collaborator Author

/ok to test cbb1ddc

@melo-gonzo
Copy link
Copy Markdown
Collaborator Author

/ok to test cbb1ddc

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 26, 2026

/ok to test cbb1ddc

@melo-gonzo, there was an error processing your request: E2

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/2/

@melo-gonzo
Copy link
Copy Markdown
Collaborator Author

/ok to test 37e1413

@melo-gonzo
Copy link
Copy Markdown
Collaborator Author

/blossom-ci

2 similar comments
@melo-gonzo
Copy link
Copy Markdown
Collaborator Author

/blossom-ci

@melo-gonzo
Copy link
Copy Markdown
Collaborator Author

/blossom-ci

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants