Skip to content

Fix GPU lod_view for contiguous-mode multi-LOD#136

Open
nclack wants to merge 1 commit into
acquire-project:mainfrom
nclack:fix-gpu-lod-view-contiguous
Open

Fix GPU lod_view for contiguous-mode multi-LOD#136
nclack wants to merge 1 commit into
acquire-project:mainfrom
nclack:fix-gpu-lod-view-contiguous

Conversation

@nclack
Copy link
Copy Markdown
Member

@nclack nclack commented May 14, 2026

Closes #135

In contiguous mode (page_size == 0) the GPU bias kernel is skipped, so chunks pack from position 0 across all LODs in d_aggregated. The pre-fix code did two things in lockstep that only worked due to undefined behavior:

  • lod_view pointed result.data at h_aggregated + seg->data_segment_offset.
  • The rebase loop subtracted the same data_segment_offset from h_offsets.

For LOD >= 1 in contiguous mode, data_segment_offset is page-aligned past the cumulative LOD-0 bytes, so the rebased offsets underflowed size_t and only landed on the correct address via 64-bit pointer-arithmetic wrap-around. This is technically UB and was an accident waiting to break under different memory layouts (guard pages, ASan, etc.).

The fix computes a per-LOD data_base array up front and feeds the same value into both the rebase subtraction and the lod_view data pointer:

  • Carry-over mode (page_size > 0): data_base[lv] = seg->data_segment_offset (matches the bias kernel).
  • Contiguous mode (page_size == 0): data_base[lv] = h_offsets[batch_covering_offset + lv] — the cumulative actual bytes of prior LODs, read via the zero-sized per-LOD sentinel slot.

This matches the CPU pipeline's view shape (src/cpu/aggregate.c:330-338) and removes the underflow without changing the produced bytes.

Test changes:

  • test_cross_validate_lod now compares ALL levels byte-exact (previously L0 only). This exercises the contiguous-mode LOD >= 1 delivery path against the CPU reference.
  • test_batch_multiscale_unaligned_K adds an l1_nonzero_bytes > 0 assertion on the delivered L1 shard buffers as a regression guard against future variants of this bug that lose chunk data entirely.

Full ctest passes (48/48).

In contiguous mode (page_size == 0) the bias kernel is skipped, so chunks
pack from position 0 across all LODs in d_aggregated. lod_view pointed
result.data at h_aggregated + seg->data_segment_offset and the rebase loop
subtracted the same value from h_offsets; for LOD >= 1 the data_segment_offset
exceeds the actual cumulative LOD-0 bytes, so the resulting offsets underflow
size_t and only happen to land on the right address via pointer-arith
wrap-around (UB).

Compute the per-LOD data base once before the rebase loop. In carry-over mode
use seg->data_segment_offset (matches the bias kernel). In contiguous mode
use h_offsets[batch_covering_offset + lv] (the cumulative prior-LOD bytes via
the zero-sized per-LOD sentinel). Feed the same value into both the rebase
subtraction and the lod_view data pointer so addresses match without UB.

Extend test_cross_validate_lod to compare ALL levels byte-exact (was L0 only)
and add a non-zero chunk-data assertion to test_batch_multiscale_unaligned_K.

Closes acquire-project#135
@codecov
Copy link
Copy Markdown

codecov Bot commented May 14, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
see 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GPU lod_view returns garbage for LOD≥1 in contiguous mode (page_size=0)

1 participant