Dynamic side metadata base address by qinsoon · Pull Request #1449 · mmtk/mmtk-core

qinsoon · 2026-02-19T01:07:49Z

This addresses part of the issues with #1351. This PR allows side metadata to be mmapped dynamically, and also allows side metadata to use a fixed address specified in the option side_metadata_base_address.

Use a runtime-mapped base for side metadata and make offsets relative. Replace OnceLock with a fast static base and shared init, update address math, and adjust tests/spec layout and docs accordingly. Also add mmap-noreserve-anywhere support and guard MSRV pin in CI scripts.

Avoid 32-bit overflow in mmapper range limit, adjust side metadata sanity expectations, and use small 32-bit test addresses for contiguous conversion tests. Also constrain mmap annotation handling on 32-bit.

Verify the runtime side metadata base is initialized, aligned, and that global metadata addresses fall within the reserved range. Check 64-bit local base offset consistency.

qinsoon · 2026-02-19T01:12:57Z

I used cargo bench to quickly test the performance.

side_metadata_address_to_meta_address() is 10x slower. With a constant side metadata base address, this function should be completely optimized away with constant folding. But using a dynamically mapped base address, this function needs to be computed.

SideMetadata::load() is roughly 3x slower.

=== This PR ===
side_metadata_address_translation       time:   [4.7127 µs 4.7142 µs 4.7163 µs]
Found 12 outliers among 100 measurements (12.00%)
  3 (3.00%) low severe
  3 (3.00%) low mild
  4 (4.00%) high mild
  2 (2.00%) high severe
side_metadata_load      time:   [8.8293 µs 8.8310 µs 8.8328 µs]
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) low severe
  2 (2.00%) low mild
  1 (1.00%) high mild

=== master ===
  side_metadata_address_translation
                        time:   [505.57 ns 505.73 ns 505.88 ns]
                        change: [−89.293% −89.285% −89.276%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low severe
  6 (6.00%) low mild
  2 (2.00%) high mild

side_metadata_load      time:   [2.6913 µs 2.7012 µs 2.7121 µs]
                        change: [−69.579% −69.518% −69.453%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 21 outliers among 100 measurements (21.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  19 (19.00%) high severe

I will run dacapo benchmarks.

into consideration when quarantine side metadata

qinsoon · 2026-02-22T23:10:00Z

The following is the performance for Immix using 3x G1 min heap:
https://squirrel.anu.edu.au/plotty/yilin/mmtk/#0|shrew-2026-02-20-Fri-004537&benchmark^build^invocation^iteration&time^time.other^time.stw&|10&iteration^1^4|20&1^invocation|30&1&benchmark&build;jdk-21-constant-side-metadata|41&Histogram%20(with%20CI)^build^benchmark&

Immix only uses side metadata during GC, and the mean for STW time is 3.2% slowdown. The worst case is around 11% STW time slowdown (biojava).

I probably should measure a generational plan which uses log bits side metadata during mutator time.

verify_side_metadata_sanity() from Plan::new() to MMTK::new() after mmapping side metadata.

qinsoon · 2026-02-23T22:13:21Z

This is the result for GenImmix (also using 3x G1 min heap):
https://squirrel.anu.edu.au/plotty/yilin/mmtk/#0|shrew-2026-02-20-Fri-004537&benchmark^build^invocation^iteration&time^time.other^time.stw&|10&iteration^1^4|20&1^invocation|30&1&benchmark&build;jdk-21-constant-side-metadata|41&Histogram%20(with%20CI)^build^benchmark&

Generally it showed no slowdown for mutator time. The reason is that there is little change to the JIT'd code. The only difference is that at JIT time, we need to load the side metadata address from a variable instead of a constant -- but this only affects JIT time, not run time. See https://github.com/mmtk/mmtk-openjdk/pull/343/changes#diff-27141e7f6636a2ef36ac24f81d7025e231f1b091660a28026edba98de5c1156a.

It also showed no slowdown for STW time. I think the reason is that most GCs are nursery GCs (CopySpace), and CopySpace only uses side log bits (forwarding bits/pointers are header metadata). So compared to full heap Immix (which uses a lot of side metadata), generational GCs are less affected by this PR.

qinsoon · 2026-02-23T22:16:32Z

I will further look into the micro benchmarks and the Immix performance.

qinsoon · 2026-02-24T03:20:31Z

 }

 /// Performs the translation of data address (`data_addr`) to metadata address for the specified metadata (`metadata_spec`).
+#[inline(always)]


This #[inline(always)] directive is necessary. With this inlined, the microbenchmark shows no slowdown on loading side metadata.

side_metadata_load time: [2.6933 µs 2.6940 µs 2.6948 µs] change: [−68.453% −68.423% −68.397%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 2 (2.00%) low severe 1 (1.00%) high mild

I am running Immix again, and see if the inline fixes the slowdown for Immix.

Note that I use PGO when building OpenJDK. If the manual inline directive fixes the slowdown for Immix, that means somehow PGO does not inline the key function for side metadata address calculation.

... With this inlined, the microbenchmark shows no slowdown...

That's right. The compiler can usually figure out what to inline in real-world VM bindings, but for microbenchmarks (cargo bench), manual inlining matters. I have observed this before, and added some annotations in src/util/test_private/mod.rs. I think you can try adding wrappers in the test_private mod that have #[inline(always)] and keep side_metadata/helpers.rs free of inlining annotations. If manual inlining is still necessary for the OpenJDK binding, we can keep #[inline(always)] in helpers.rs.

The manual inline does not fix the performance for Immix on dacapo: https://squirrel.anu.edu.au/plotty/yilin/mmtk/#0|shrew-2026-02-24-Tue-031617&benchmark^build^invocation^iteration&time^time.other^time.stw&|10&iteration^1^4|20&1^invocation|30&1&benchmark&build;jdk-21-constant-side-metadata|41&Histogram%20(with%20CI)^build^benchmark&

qinsoon · 2026-02-24T03:55:09Z

binding-refs
OPENJDK21_BINDING_REPO=qinsoon/mmtk-openjdk
OPENJDK21_BINDING_REF=dynamic-side-metadata-address
OPENJDK11_BINDING_REPO=qinsoon/mmtk-openjdk
OPENJDK11_BINDING_REF=dynamic-side-metadata-address-11
JULIA_BINDING_REPO=qinsoon/mmtk-julia
JULIA_BINDING_REF=dynamic-side-metadata-address
JULIA_VM_REPO=qinsoon/julia
JULIA_VM_REF=no-const-side-metadata-address
RUBY_BINDING_REPO=qinsoon/mmtk-ruby
RUBY_BINDING_REF=dynamic-side-metadata-address
JIKESRVM_BINDING_REPO=qinsoon/mmtk-jikesrvm
JIKESRVM_BINDING_REF=dynamic-side-metadata-address

qinsoon · 2026-02-26T23:26:01Z

+// (starting from zero) and add the runtime base address when computing actual addresses.
+pub(crate) const GLOBAL_SIDE_METADATA_BASE_OFFSET: usize = 0;
+
+static mut SIDE_METADATA_BASE_ADDRESS: Address = Address::ZERO;


I replaced this static mut with a OnceLock, and got the following results on biojava which showed no slowdown for this PR.
https://squirrel.anu.edu.au/plotty/yilin/mmtk/#0|shrew-2026-02-26-Thu-071955&benchmark^build^invocation^iteration^stickyix&time^time.other^time.stw&|10&iteration^1^4&stickyix^1^null|20&1^invocation|30&1&benchmark&build;jdk-21-constant-side-metadata

If we use static mut for the base address, Rust needs to repetitively load from the variable every time we access side metadata. I inspected the assembly code of ImmixSpace::trace_object, and in around 100 instructions, there are 3 extra loads for this PR to get the side metadata address.

Intuitively OnceLock should be slower, as get() always checks if the value is initialized or not before loading it. However, if we use oncelock.get().unwrap_unchecked(), it seems the compiler can assume that there is no need to check or branch, and the value will not be changed so there is no need to repetitively load from it.

Using OnceLock seems to fix the slowdown on biojava, and I am running the experiments again for all the benchmarks.

This is the results for all the benchmarks. With OnceLock, the slowdown for STW time is 3.5% (compared to 4.3% with static mut).
https://squirrel.anu.edu.au/plotty/yilin/mmtk/#0|shrew-2026-02-26-Thu-225410&benchmark^build^invocation^iteration^stickyix&time^time.other^time.stw&|10&iteration^1^4|20&1^invocation|30&1&benchmark&build;jdk-21-constant-side-metadata|41&Histogram%20(with%20CI)^build^benchmark&

Some benchmarks showed large error bars. The following is a rerun (which hasn't finished yet):
https://squirrel.anu.edu.au/plotty/yilin/mmtk/#0|shrew-2026-03-01-Sun-070849&benchmark^build^invocation^iteration^stickyix&time^time.other^time.stw&|10&iteration^1^4|20&1^invocation|30&1&benchmark&build;jdk-21-constant-side-metadata|41&Histogram%20(with%20CI)^build^benchmark&

Both seem suggest with OnceLock, we see slowdown for batik, but for other benchmarks, OnceLock is a performance improvement.

I will still need to investigate the slowdown.

I can't reproduce the same slowdown for luindex after turn on work_packet_stats and perf_counter: http://squirrel.anu.edu.au/plotty/yilin/mmtk/p/zPrqcq

I don't plan to spend more time on it.

work_packet_stats slows down GC by about 40% from what I recalled. So it's not surprising that the slowdown gets masked.

work_packet_stats alone should only slow down GC by a few percentage (~5% from my memory). 40% or so is only when you read perf counter per packet -- I was not doing that. I just enabled perf_counter, but didn't set any perf event to read.

The previous run that showed slowdown for luindex got luindex's stw as

constant side metadata dynamic side metadata (static mut) dynamic side metadata (oncelock)

stw time (ms) 62.259 63.421 66.629

My new run (and new builds) with work_packet_stats and perf_counter got stw as

constant side metadata dynamic side metadata (static mut) dynamic side metadata (oncelock)

stw time (ms) 67.688 67.154 68.134

So everything got slower (which is expected), but the build for dynamic side metadata with oncelock is less affected. It seems to suggest the issue with oncelock is concurrency. work_packet_stats requires every packets to acquire locks to update the stats counters.

use side metadata

qinsoon · 2026-03-09T02:23:37Z

+        // Initialize side metadat sanity first
+        plan.verify_side_metadata_sanity();
+        // Then intiialize SFT because it may use side metadata
+        plan.initialize_sft();


These two lines use side metadata, so they have to happen after side metadata is initialized. They used to happen in Plan::new(). I extracted them, as at some point, I called initialize_side_metadata() after Plan::new(). So I had to move those two lines after Plan::new() and initialize_side_metadata().

Now initialize_side_metadata() is called before Plan::new(). It is not necessary to have these two lines here. But I think this is still clearer.

wks

I think when quarantining memory, the mmap strategy is always the same. As I suggested in the comments, we may remove some strategy argument and use a fixed strategy for quarantining.

wks · 2026-03-09T09:16:29Z

A high-level comment: OnceLock has a method OnceLock::get_unchecked() that can bypass the intermediate Option<&T> from OnceLock::get(). I haven't tried whether it is faster than l.get().unwrap_unchecked(), but it is worth trying.

qinsoon · 2026-04-08T01:35:17Z

A high-level comment: OnceLock has a method OnceLock::get_unchecked() that can bypass the intermediate Option<&T> from OnceLock::get(). I haven't tried whether it is faster than l.get().unwrap_unchecked(), but it is worth trying.

This is a good point. And according to rust-lang/libs-team#654, get().unwrap_unchecked() does not completely eliminate the atomic load in get(). However, we need to wait until OnceLock::get_unchecked() is stabilized. I added TODO in the code.

wks

I have walked through the PR, and there are only a few minor problems left. See the comments.

wks

LGTM

qinsoon added 6 commits February 9, 2026 00:33

Fix 32-bit side metadata tests

99af24b

Avoid 32-bit overflow in mmapper range limit, adjust side metadata sanity expectations, and use small 32-bit test addresses for contiguous conversion tests. Also constrain mmap annotation handling on 32-bit.

Add test for dynamic side metadata base

3e334a4

Verify the runtime side metadata base is initialized, aligned, and that global metadata addresses fall within the reserved range. Check 64-bit local base offset consistency.

Fix style check

7221c4c

Merge branch 'master' into dynamic-side-metadata-address

522c1da

Add benches to test side metadata address calculation

726dc2e

Move side metadata initilaization to MMTK::new(). Take VM side metadata

39377a5

into consideration when quarantine side metadata

qinsoon added 2 commits February 22, 2026 23:44

Introduce MmapResult

6afda68

Avoid accessing side metadata before we mmap its range. Move

1c5998d

verify_side_metadata_sanity() from Plan::new() to MMTK::new() after mmapping side metadata.

Add an inline directive. Fix benches.

51bb82e

qinsoon commented Feb 24, 2026

View reviewed changes

qinsoon added 3 commits February 24, 2026 03:53

Cleanup

4684773

Minor cleanup

cf2b854

Use wrapping_add to print mmap error

b150200

qinsoon added 3 commits February 24, 2026 04:11

Remove SideMetadataOffset, simply use usize instead.

05e464c

Use serial_test for side metadata tests

0f889f5

Fix doc

b4c4539

qinsoon force-pushed the dynamic-side-metadata-address branch from cdd59e9 to b4c4539 Compare February 24, 2026 04:29

qinsoon added 2 commits February 25, 2026 03:13

Remove inline directive

3f92beb

Merge branch 'master' into dynamic-side-metadata-address

95bedc4

qinsoon commented Feb 26, 2026

View reviewed changes

qinsoon added 4 commits March 1, 2026 23:29

Initialize side metadata for tests

0fe3cd5

Change base address to OnceLock

4657042

Fix setting vm side metadata on 32 bits

1960838

Fix style check

f1f1606

qinsoon added 2 commits March 5, 2026 23:19

Fix issues on 32 bits

fa6c317

Skip invalid tests on 32 bits

a4db026

qinsoon added the PR-extended-testing Run extended tests for the pull request label Mar 6, 2026

qinsoon added 3 commits March 6, 2026 02:45

Allow specifying side metadata address

17ebe02

Merge branch 'master' into dynamic-side-metadata-address

4ea8099

Initialize side metadata sanity before setting sft for spaces which may

825c634

use side metadata

qinsoon commented Mar 9, 2026

View reviewed changes

qinsoon marked this pull request as ready for review March 9, 2026 02:26

wks reviewed Mar 9, 2026

View reviewed changes

Comment thread src/util/os/imp/unix_like/linux_like/linux_common.rs Outdated

Comment thread src/util/metadata/side_metadata/layout.rs Outdated

Comment thread src/util/heap/layout/mmapper/csm/mod.rs Outdated

wks reviewed Mar 9, 2026

View reviewed changes

Comment thread src/util/os/imp/unix_like/unix_common.rs Outdated

qinsoon added 3 commits April 8, 2026 01:07

Address reviews: add MmapStrategy::QUAARANTINE

8a87370

Handle munmap in mmap_anywhere

0cd89e5

Add comments about OnceLock::get_unchecked

b47db04

qinsoon added 2 commits April 8, 2026 01:37

Merge branch 'master' into dynamic-side-metadata-address

d756fc7

Add a migration entry for the change

782c3fa

wks reviewed Apr 10, 2026

View reviewed changes

Comment thread benches/regular_bench/bulk_meta/access.rs

Comment thread src/util/metadata/side_metadata/global.rs Outdated

Comment thread src/util/metadata/side_metadata/global.rs Outdated

wks reviewed Apr 10, 2026

View reviewed changes

qinsoon added 3 commits April 13, 2026 00:25

Rename absolute/rel offset to starting address and offset for checked.

fc3f78b

Misc changes

79120bb

Rename for 32 bits code

48869cc

wks reviewed Apr 13, 2026

View reviewed changes

Comment thread src/util/os/imp/unix_like/macos.rs Outdated

Comment thread src/util/test_private/mod.rs Outdated

Minor changes to address reviews

0d591e8

wks approved these changes Apr 13, 2026

View reviewed changes

qinsoon added this pull request to the merge queue Apr 16, 2026

Merged via the queue into mmtk:master with commit 37d8121 Apr 16, 2026
34 of 35 checks passed

qinsoon deleted the dynamic-side-metadata-address branch April 16, 2026 05:45

wks mentioned this pull request Apr 16, 2026

malloc MarkSweep crashes during sweeping #1479

Open

Conversation

qinsoon commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qinsoon commented Feb 19, 2026

Uh oh!

qinsoon commented Feb 22, 2026

Uh oh!

qinsoon commented Feb 23, 2026

Uh oh!

qinsoon commented Feb 23, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qinsoon commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wks left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wks commented Mar 9, 2026

Uh oh!

qinsoon commented Apr 8, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wks left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

wks left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

qinsoon commented Feb 19, 2026 •

edited

Loading

qinsoon commented Feb 24, 2026 •

edited

Loading