MIBM Performance Optimizations#1157
MIBM Performance Optimizations#1157danieljvickers wants to merge 59 commits intoMFlowCode:masterfrom
Conversation
…mething about that...
…generating golden file
Claude Code ReviewHead SHA: SummaryThe PR makes significant and valuable performance improvements by moving IBM computations to the GPU with proper Fypp macro usage. The new projection-based distance computation is mathematically cleaner than the interpolation approach. GPU macro usage is correct throughout — all GPU parallelism uses However, there are several issues that should be addressed before merging. Critical Issues1.
Fix: Allocate 2.
Fix: Introduce a separate 3. Division by zero in
dist = sqrt((point(1) - v1(1))**2 + (point(2) - v1(2))**2)
norm = norm/dist ! Division by zero if dist == 0The 3D counterpart ( Fix: Add 4. Missing
None are deallocated anywhere, violating the Fix: Add a Important Issues5.
error stop "Ghost Point and Image Point on Different Processors"Per project rules, use 6. Stale error message in
self.prohibit(ib and (num_ibs <= 0 or num_ibs > 1000),
"num_ibs must be between 1 and num_patches_max (10)")Limit changed to 1000 but message still says "(10)". 7. Unbounded
The bounds check that prevents Fix: Add a bounded iteration limit and exit if exceeded. Minor Notes
VerdictThe GPU offloading approach is correct and the Fypp macro usage follows project conventions. The four critical issues (loop allocation crash, memory inflation, division by zero, missing deallocations) should be fixed before merging. The important issues are lower priority but worth addressing. |
User description
Description
Following the refactor of the levelset, there were several performance optimizations left to be made to the code. This PR introduces optimizations that will make multi-particle MIBM code viable. It also expands the upper bound of allowed number of immersed boundaries to 1000. Performance was measured on 1-4 ranks of ACC GPU compute using A100 GPUs.
This PR has extended optimization to STL IBs, which should significantly improve accuracy, performance, and code cleanliness. The primary optimizations are as follows:
Type of change
Testing
All changes pass the IBM section of the test suite on GPUs with the NVHPC compiler. Performance was measured with a case of 1000 particles with viscosity enabled. The particles are all resolved 3D spheres given random non-overlapping positions generated by the following case file:
These optimizations add nearly x1000 performance in the moving IBM propagation and generation code. Prior to these optimizations, this was the result of the benchmark case using the NVIDIA NSight profiler showing 45 seconds to run a single RK substep:
Following these optimizations, the same profile achieves almost 50 ms per RK substep:

For STLs, the optimizations were tested on a 822,000 vertex mesh of a Mach 0.4 corgi, given by this STL:
https://www.thingiverse.com/thing:4721563/files
The final simulation finished in a total of 25 minutes on a 200^3 grid for 4k time steps on a single A100 GPU. All of the code related to the STL model (file reading, preprocessing, IB marker generation, and levelset compute) took only 20 seconds of the run time. The result of that simulation can be viewed here:
https://www.youtube.com/watch?v=h44BNCKo0Hs
Checklist
See the developer guide for full coding standards.
GPU changes (expand if you modified
src/simulation/)Summary by CodeRabbit
New Features
Enhancements
Behavior Changes
Tests/Validation
CodeAnt-AI Description
GPU-accelerate STL immersed-boundary compute and support up to 1000 IBs
What Changed
Impact
✅ Faster IB marker generation✅ Lower CPU during IB setup and levelset evaluation✅ Support up to 1000 immersed boundaries💡 Usage Guide
Checking Your Pull Request
Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.
Talking to CodeAnt AI
Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:
This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.
Example
Preserve Org Learnings with CodeAnt
You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:
This helps CodeAnt AI learn and adapt to your team's coding style and standards.
Example
Retrigger review
Ask CodeAnt AI to review the PR again, by typing:
Check Your Repository Health
To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.