Add support for nonuniform data structures#2464
Merged
dennisYatunin merged 1 commit intomainfrom Apr 3, 2026
Merged
Conversation
d8b7216 to
03c322e
Compare
b923c85 to
972935f
Compare
c492a62 to
5e2bcbf
Compare
nefrathenrici
approved these changes
Mar 20, 2026
Member
nefrathenrici
left a comment
There was a problem hiding this comment.
Maybe I missed them, but it would be good to exercise bitcast_struct for non-uniform structs before merging.
5826347 to
4ab281d
Compare
This was referenced Apr 3, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
This PR refactors
src/DataLayouts/struct.jlto support nonuniform data structures.This will allow us to store combinations of 64-bit integers and 32-bit floats within a single
Field, so that we can proceed with CliMA/ClimaAtmos.jl#4336 (which addresses CliMA/ClimaParams.jl#280). More generally, this will simplify the process of generatingFields of boolean masks, integer indices, or any other data structure needed in ClimaAtmos, ClimaLand, etc.Content
default_basetypeandchecked_valid_basetypebitcast_structas a recursively unrolled extension ofCore.bitcastfor composite typesreinterpret, since the built-in form only works with heap memoryIndexLinearandIndexCartesianforms ofget_structandset_struct!into a unified implementationget_structandset_struct!by separating unrolled type manipulations from memory operationsbitcast_structgetindexandsetindex!forDataLayouts, dropping the unusedF = 1from eachCartesianIndexgetpropertyforDataLayouts, reducing the amount of duplicate code and generated functionsparent_array_typeandpromote_parent_array_typeby avoiding unusedTypeVarstypesizetonum_basetypes, making it consistent with the other internals instruct.jlreplace_basetypetoreplace_type_parameter, and move it toUtilities@inboundsannotations ingetindex/setindex!@inboundsannotation in the 2D weak curl operatorPerformance Notes
After exploring many different ways of implementing
get_struct/set_struct!/bitcast_struct, I have kept the version that came closest to matching our current performance:Prognostic EDMF TRMM in a column (implicit)has worsened from 2.064 to 1.47held suarezhas improved from 0.383 to 0.403GPU: Schar Mountain Test (2D, Float32)has improved from 0.178 to 0.195GPU: Prognostic EDMF aquaplanethas improved from 0.262 to 0.274Unit: operator matrices (GPU)unit tests have improved by around 10%Unit: operator matrices (CPU)unit tests have improved by 10--20%, while the walltimes of a few others have worsened by nearly a factor of 2Schar Mountain total energy (topography mesh interface)has worsened from 221 to 3693D sphere dry Held-Suarez (ρe)has improved from 625 to 586The implementation I've chosen is limited to a single generated function (excluding compile-time string interpolations),
bitcast_struct, which is defined in a way that forces the LLVM compiler to inline array reads without too much of a latency penalty. This seems to be a key feature of our original implementation, which was full of recursive generated functions that attempt to force inlining. For the specific case ofget_structandset_struct!, inlining of array read instructions is needed to avoid unreasonably large stack memory allocations, since we rely on the LLVM compiler to eliminate read instructions and corresponding allocations for unused values (e.g., most of theLocalGeometrystruct fields).There is probably a better way to force inlining than the generated function I used here, but I think we'll need to implement more rigorous unit tests before we can figure out what that is. We could drawn inspiration from Julia's own unit tests for inlining, or from the analogous tests in CUDA.jl. I compared around a dozen different alternatives for this PR, and I chose the one that happened to produce the best overall results without an excessive latency penalty. But, if we want to guarantee optimal performance across all examples, we'll need to get a better understanding of the compiler's inlining heuristics.
Checklist