Dynamic side metadata base address#1449
Conversation
Use a runtime-mapped base for side metadata and make offsets relative. Replace OnceLock with a fast static base and shared init, update address math, and adjust tests/spec layout and docs accordingly. Also add mmap-noreserve-anywhere support and guard MSRV pin in CI scripts.
Avoid 32-bit overflow in mmapper range limit, adjust side metadata sanity expectations, and use small 32-bit test addresses for contiguous conversion tests. Also constrain mmap annotation handling on 32-bit.
Verify the runtime side metadata base is initialized, aligned, and that global metadata addresses fall within the reserved range. Check 64-bit local base offset consistency.
|
I used
I will run dacapo benchmarks. |
into consideration when quarantine side metadata
|
The following is the performance for Immix using 3x G1 min heap: Immix only uses side metadata during GC, and the mean for STW time is 3.2% slowdown. The worst case is around 11% STW time slowdown ( I probably should measure a generational plan which uses log bits side metadata during mutator time. |
verify_side_metadata_sanity() from Plan::new() to MMTK::new() after mmapping side metadata.
|
This is the result for GenImmix (also using 3x G1 min heap): Generally it showed no slowdown for mutator time. The reason is that there is little change to the JIT'd code. The only difference is that at JIT time, we need to load the side metadata address from a variable instead of a constant -- but this only affects JIT time, not run time. See https://github.com/mmtk/mmtk-openjdk/pull/343/changes#diff-27141e7f6636a2ef36ac24f81d7025e231f1b091660a28026edba98de5c1156a. It also showed no slowdown for STW time. I think the reason is that most GCs are nursery GCs ( |
|
I will further look into the micro benchmarks and the Immix performance. |
| } | ||
|
|
||
| /// Performs the translation of data address (`data_addr`) to metadata address for the specified metadata (`metadata_spec`). | ||
| #[inline(always)] |
There was a problem hiding this comment.
This #[inline(always)] directive is necessary. With this inlined, the microbenchmark shows no slowdown on loading side metadata.
side_metadata_load time: [2.6933 µs 2.6940 µs 2.6948 µs]
change: [−68.453% −68.423% −68.397%] (p = 0.00 < 0.05)
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) low severe
1 (1.00%) high mild
I am running Immix again, and see if the inline fixes the slowdown for Immix.
Note that I use PGO when building OpenJDK. If the manual inline directive fixes the slowdown for Immix, that means somehow PGO does not inline the key function for side metadata address calculation.
There was a problem hiding this comment.
... With this inlined, the microbenchmark shows no slowdown...
That's right. The compiler can usually figure out what to inline in real-world VM bindings, but for microbenchmarks (cargo bench), manual inlining matters. I have observed this before, and added some annotations in src/util/test_private/mod.rs. I think you can try adding wrappers in the test_private mod that have #[inline(always)] and keep side_metadata/helpers.rs free of inlining annotations. If manual inlining is still necessary for the OpenJDK binding, we can keep #[inline(always)] in helpers.rs.
There was a problem hiding this comment.
The manual inline does not fix the performance for Immix on dacapo: https://squirrel.anu.edu.au/plotty/yilin/mmtk/#0|shrew-2026-02-24-Tue-031617&benchmark^build^invocation^iteration&time^time.other^time.stw&|10&iteration^1^4|20&1^invocation|30&1&benchmark&build;jdk-21-constant-side-metadata|41&Histogram%20(with%20CI)^build^benchmark&
|
binding-refs |
cdd59e9 to
b4c4539
Compare
| // (starting from zero) and add the runtime base address when computing actual addresses. | ||
| pub(crate) const GLOBAL_SIDE_METADATA_BASE_OFFSET: usize = 0; | ||
|
|
||
| static mut SIDE_METADATA_BASE_ADDRESS: Address = Address::ZERO; |
There was a problem hiding this comment.
I replaced this static mut with a OnceLock, and got the following results on biojava which showed no slowdown for this PR.
https://squirrel.anu.edu.au/plotty/yilin/mmtk/#0|shrew-2026-02-26-Thu-071955&benchmark^build^invocation^iteration^stickyix&time^time.other^time.stw&|10&iteration^1^4&stickyix^1^null|20&1^invocation|30&1&benchmark&build;jdk-21-constant-side-metadata
If we use static mut for the base address, Rust needs to repetitively load from the variable every time we access side metadata. I inspected the assembly code of ImmixSpace::trace_object, and in around 100 instructions, there are 3 extra loads for this PR to get the side metadata address.
Intuitively OnceLock should be slower, as get() always checks if the value is initialized or not before loading it. However, if we use oncelock.get().unwrap_unchecked(), it seems the compiler can assume that there is no need to check or branch, and the value will not be changed so there is no need to repetitively load from it.
Using OnceLock seems to fix the slowdown on biojava, and I am running the experiments again for all the benchmarks.
There was a problem hiding this comment.
This is the results for all the benchmarks. With OnceLock, the slowdown for STW time is 3.5% (compared to 4.3% with static mut).
https://squirrel.anu.edu.au/plotty/yilin/mmtk/#0|shrew-2026-02-26-Thu-225410&benchmark^build^invocation^iteration^stickyix&time^time.other^time.stw&|10&iteration^1^4|20&1^invocation|30&1&benchmark&build;jdk-21-constant-side-metadata|41&Histogram%20(with%20CI)^build^benchmark&
Some benchmarks showed large error bars. The following is a rerun (which hasn't finished yet):
https://squirrel.anu.edu.au/plotty/yilin/mmtk/#0|shrew-2026-03-01-Sun-070849&benchmark^build^invocation^iteration^stickyix&time^time.other^time.stw&|10&iteration^1^4|20&1^invocation|30&1&benchmark&build;jdk-21-constant-side-metadata|41&Histogram%20(with%20CI)^build^benchmark&
Both seem suggest with OnceLock, we see slowdown for batik, but for other benchmarks, OnceLock is a performance improvement.
There was a problem hiding this comment.
I will still need to investigate the slowdown.
There was a problem hiding this comment.
I can't reproduce the same slowdown for luindex after turn on work_packet_stats and perf_counter: http://squirrel.anu.edu.au/plotty/yilin/mmtk/p/zPrqcq
I don't plan to spend more time on it.
There was a problem hiding this comment.
work_packet_stats slows down GC by about 40% from what I recalled. So it's not surprising that the slowdown gets masked.
There was a problem hiding this comment.
work_packet_stats alone should only slow down GC by a few percentage (~5% from my memory). 40% or so is only when you read perf counter per packet -- I was not doing that. I just enabled perf_counter, but didn't set any perf event to read.
There was a problem hiding this comment.
The previous run that showed slowdown for luindex got luindex's stw as
| constant side metadata | dynamic side metadata (static mut) | dynamic side metadata (oncelock) | |
|---|---|---|---|
| stw time (ms) | 62.259 | 63.421 | 66.629 |
My new run (and new builds) with work_packet_stats and perf_counter got stw as
| constant side metadata | dynamic side metadata (static mut) | dynamic side metadata (oncelock) | |
|---|---|---|---|
| stw time (ms) | 67.688 | 67.154 | 68.134 |
So everything got slower (which is expected), but the build for dynamic side metadata with oncelock is less affected. It seems to suggest the issue with oncelock is concurrency. work_packet_stats requires every packets to acquire locks to update the stats counters.
| // Initialize side metadat sanity first | ||
| plan.verify_side_metadata_sanity(); | ||
| // Then intiialize SFT because it may use side metadata | ||
| plan.initialize_sft(); |
There was a problem hiding this comment.
These two lines use side metadata, so they have to happen after side metadata is initialized. They used to happen in Plan::new(). I extracted them, as at some point, I called initialize_side_metadata() after Plan::new(). So I had to move those two lines after Plan::new() and initialize_side_metadata().
Now initialize_side_metadata() is called before Plan::new(). It is not necessary to have these two lines here. But I think this is still clearer.
wks
left a comment
There was a problem hiding this comment.
I think when quarantining memory, the mmap strategy is always the same. As I suggested in the comments, we may remove some strategy argument and use a fixed strategy for quarantining.
|
A high-level comment: |
This is a good point. And according to rust-lang/libs-team#654, |
wks
left a comment
There was a problem hiding this comment.
I have walked through the PR, and there are only a few minor problems left. See the comments.
This addresses part of the issues with #1351. This PR allows side metadata to be mmapped dynamically, and also allows side metadata to use a fixed address specified in the option
side_metadata_base_address.