Fix GPU lod_view for contiguous-mode multi-LOD by nclack · Pull Request #136 · acquire-project/chucky

nclack · 2026-05-14T01:23:49Z

Closes #135

In contiguous mode (page_size == 0) the GPU bias kernel is skipped, so chunks pack from position 0 across all LODs in d_aggregated. The pre-fix code did two things in lockstep that only worked due to undefined behavior:

lod_view pointed result.data at h_aggregated + seg->data_segment_offset.
The rebase loop subtracted the same data_segment_offset from h_offsets.

For LOD >= 1 in contiguous mode, data_segment_offset is page-aligned past the cumulative LOD-0 bytes, so the rebased offsets underflowed size_t and only landed on the correct address via 64-bit pointer-arithmetic wrap-around. This is technically UB and was an accident waiting to break under different memory layouts (guard pages, ASan, etc.).

The fix computes a per-LOD data_base array up front and feeds the same value into both the rebase subtraction and the lod_view data pointer:

Carry-over mode (page_size > 0): data_base[lv] = seg->data_segment_offset (matches the bias kernel).
Contiguous mode (page_size == 0): data_base[lv] = h_offsets[batch_covering_offset + lv] — the cumulative actual bytes of prior LODs, read via the zero-sized per-LOD sentinel slot.

This matches the CPU pipeline's view shape (src/cpu/aggregate.c:330-338) and removes the underflow without changing the produced bytes.

Test changes:

test_cross_validate_lod now compares ALL levels byte-exact (previously L0 only). This exercises the contiguous-mode LOD >= 1 delivery path against the CPU reference.
test_batch_multiscale_unaligned_K adds an l1_nonzero_bytes > 0 assertion on the delivered L1 shard buffers as a regression guard against future variants of this bug that lose chunk data entirely.

Full ctest passes (48/48).

In contiguous mode (page_size == 0) the bias kernel is skipped, so chunks pack from position 0 across all LODs in d_aggregated. lod_view pointed result.data at h_aggregated + seg->data_segment_offset and the rebase loop subtracted the same value from h_offsets; for LOD >= 1 the data_segment_offset exceeds the actual cumulative LOD-0 bytes, so the resulting offsets underflow size_t and only happen to land on the right address via pointer-arith wrap-around (UB). Compute the per-LOD data base once before the rebase loop. In carry-over mode use seg->data_segment_offset (matches the bias kernel). In contiguous mode use h_offsets[batch_covering_offset + lv] (the cumulative prior-LOD bytes via the zero-sized per-LOD sentinel). Feed the same value into both the rebase subtraction and the lod_view data pointer so addresses match without UB. Extend test_cross_validate_lod to compare ALL levels byte-exact (was L0 only) and add a non-zero chunk-data assertion to test_batch_multiscale_unaligned_K. Closes acquire-project#135

codecov · 2026-05-14T01:29:23Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
see 1 file with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix GPU lod_view for contiguous-mode multi-LOD#136

Fix GPU lod_view for contiguous-mode multi-LOD#136
nclack wants to merge 1 commit into
acquire-project:mainfrom
nclack:fix-gpu-lod-view-contiguous

nclack commented May 14, 2026

Uh oh!

codecov Bot commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nclack commented May 14, 2026

Uh oh!

codecov Bot commented May 14, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant