Add fixes for Delta PrgEnv-nvidia#1258
Conversation
|
CodeAnt AI is reviewing your PR. Thanks for using CodeAnt! 🎉We're free for open-source projects. if you're enjoying it, help us grow by sharing. Share on X · |
📝 WalkthroughWalkthroughModifies NVHPC compiler configuration to disable LTO/IPO for specific versions, updates Python and GPU toolchain module specifications, and adjusts Delta environment setup to conditionally disable GPU-Direct MPI when GPU acceleration is enabled. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Nitpicks 🔍
|
|
CodeAnt AI finished reviewing your PR. |
There was a problem hiding this comment.
Pull request overview
This PR updates MFC's configuration for NCSA Delta's newer PrgEnv-nvidia environment, migrating from manual module loading to a Cray-style programming environment and adding workarounds for NVHPC compiler issues.
Changes:
- Migrate Delta GPU configuration from explicit module loading (nvhpc/24.1, cuda/12.3.0, openmpi/4.1.5+cuda) to PrgEnv-nvidia/8.6.0 meta-module
- Disable GPU-Direct MPI support on Delta to work around stability issues
- Add NVHPC version check to disable broken LTO/IPO in versions 24.11-25.9
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| toolchain/templates/delta.mako | Replace manual LD_LIBRARY_PATH export with MPICH_GPU_SUPPORT_ENABLED=0 for GPU builds |
| toolchain/modules | Switch Delta GPU configuration to PrgEnv-nvidia/8.6.0 and simplify python module loading |
| CMakeLists.txt | Add version check to disable LTO/IPO for NVHPC versions 24.11 through 25.9 |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
toolchain/templates/delta.mako (1)
39-41: Scope GPU-Direct disablement to MPI runs and confirm MPI flavor.Line 39-41 sets an MPICH-specific knob for all GPU jobs. Consider gating this behind
mpi(and/or detected MPI flavor) to avoid no-op or unintended effects in non-MPICH/non-MPI runs, and please confirm this is the correct control for Delta’s PrgEnv-nvidia MPI stack.🔧 Example gating
-% if gpu_enabled: - export MPICH_GPU_SUPPORT_ENABLED=0 # Disable GPU-Direct MPI -% endif +% if gpu_enabled and mpi: + export MPICH_GPU_SUPPORT_ENABLED=0 # Disable GPU-Direct MPI +% endif🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@toolchain/templates/delta.mako` around lines 39 - 41, The template currently unconditionally exports MPICH_GPU_SUPPORT_ENABLED when gpu_enabled is true; change this so the MPICH-specific export is only emitted for MPI runs (e.g., when an mpi flag/variable is set or when detected MPI flavor equals "mpich"), and verify that MPICH_GPU_SUPPORT_ENABLED is the correct knob for Delta’s PrgEnv-nvidia MPI stack; update the delta.mako template to check both gpu_enabled and an mpi/detected_mpi_flavor variable before exporting MPICH_GPU_SUPPORT_ENABLED and add a comment noting the MPI flavor check and that this is specific to MPICH to avoid affecting non-MPICH or non-MPI jobs.toolchain/modules (1)
60-60: Ensure the default Python module meets the 3.10+ requirement.Line 60 now loads an unpinned
pythonmodule. If the cluster default is <3.10, toolchain scripts may break; consider pinning to a known-good module or adding a version check in the load path.Based on learnings: toolchain/**/*.py must be compatible with Python 3.10+; do not use from future imports or other backwards-compatibility shims.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@toolchain/modules` at line 60, The unpinned module load ("d-all python") can pick a system Python older than 3.10; change this to load a pinned interpreter or add a runtime version guard: either replace the unpinned load with a specific module (e.g., "d-all python/3.10" or higher) or keep the unpinned load but add a check in the toolchain bootstrap that inspects sys.version_info and fails with a clear error if Python < (3,10); update any bootstrap/init logic that calls "d-all python" and document the required module version for toolchain/**/*.py compatibility.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@CMakeLists.txt`:
- Around line 241-244: The version check for NVHPC IPO currently uses exclusive
bounds but the message claims an inclusive range; update the elseif condition to
use inclusive comparisons so the logic matches the text: change the condition on
CMAKE_Fortran_COMPILER_VERSION to VERSION_GREATER_EQUAL "24.11" AND
VERSION_LESS_EQUAL "25.9" (leaving the message and set(NVHPC_USE_TWO_PASS_IPO
FALSE) as-is), or alternatively adjust the message text to say "greater than
24.11 and less than 25.9" if you prefer to keep exclusive bounds.
---
Nitpick comments:
In `@toolchain/modules`:
- Line 60: The unpinned module load ("d-all python") can pick a system Python
older than 3.10; change this to load a pinned interpreter or add a runtime
version guard: either replace the unpinned load with a specific module (e.g.,
"d-all python/3.10" or higher) or keep the unpinned load but add a check in the
toolchain bootstrap that inspects sys.version_info and fails with a clear error
if Python < (3,10); update any bootstrap/init logic that calls "d-all python"
and document the required module version for toolchain/**/*.py compatibility.
In `@toolchain/templates/delta.mako`:
- Around line 39-41: The template currently unconditionally exports
MPICH_GPU_SUPPORT_ENABLED when gpu_enabled is true; change this so the
MPICH-specific export is only emitted for MPI runs (e.g., when an mpi
flag/variable is set or when detected MPI flavor equals "mpich"), and verify
that MPICH_GPU_SUPPORT_ENABLED is the correct knob for Delta’s PrgEnv-nvidia MPI
stack; update the delta.mako template to check both gpu_enabled and an
mpi/detected_mpi_flavor variable before exporting MPICH_GPU_SUPPORT_ENABLED and
add a comment noting the MPI flavor check and that this is specific to MPICH to
avoid affecting non-MPICH or non-MPI jobs.
Claude Code ReviewFiles changed: 3
Summary
Findings1. CMakeLists.txt:241 — Version boundary is off by one (lower bound)elseif (CMAKE_Fortran_COMPILER_VERSION VERSION_GREATER "24.11" AND CMAKE_Fortran_COMPILER_VERSION VERSION_LESS "25.9")
message(STATUS "LTO/IPO is not supported in NVHPC Version 24.11 to 25.9. ...")
Similarly, Recommendation: Decide definitively whether 24.11 is broken. If broken: change 2. toolchain/modules —
|
Claude Code Review3 files changed: Summary
FindingsNo blocking issues found. Checked for bugs and CLAUDE.md compliance. Three non-blocking improvement opportunities: 1. Slightly imprecise version-range message — The new Relevant lines: Lines 239 to 241 in afe6058 2. Asymmetric The new block sets % if gpu_enabled:
export MPICH_GPU_SUPPORT_ENABLED=0 # Disable GPU-Direct MPI
% else:
export MPICH_GPU_SUPPORT_ENABLED=0
% endif(Or simply unconditionally Relevant lines: MFC/toolchain/templates/delta.mako Lines 38 to 40 in afe6058 3. Possibly redundant compiler env vars — The Relevant lines: Lines 61 to 62 in afe6058 🤖 Reviewed by Claude Code (claude-sonnet-4-6) |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1258 +/- ##
=======================================
Coverage 44.05% 44.05%
=======================================
Files 70 70
Lines 20496 20496
Branches 1989 1989
=======================================
Hits 9030 9030
Misses 10328 10328
Partials 1138 1138 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Claude Code ReviewNo issues found. Checked for bugs and CLAUDE.md compliance. |
User description
Description
Fix support for Delta's "new" PrgEnv-nvidia environment.
Type of change
Testing
Ran
examples/3D_performance_testwith GPUs using the updated CMake and template.CodeAnt-AI Description
Fix Delta PrgEnv-nvidia support and NVHPC LTO/IPO handling
What Changed
Impact
✅ Fewer runtime MPI failures on Delta GPU nodes✅ Fewer build failures with NVHPC 24.x compilers✅ Correct module selection for Delta GPU workflows💡 Usage Guide
Checking Your Pull Request
Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.
Talking to CodeAnt AI
Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:
This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.
Example
Preserve Org Learnings with CodeAnt
You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:
This helps CodeAnt AI learn and adapt to your team's coding style and standards.
Example
Retrigger review
Ask CodeAnt AI to review the PR again, by typing:
Check Your Repository Health
To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.
Summary by CodeRabbit