Fix GPU lod_view for contiguous-mode multi-LOD#136
Open
nclack wants to merge 1 commit into
Open
Conversation
In contiguous mode (page_size == 0) the bias kernel is skipped, so chunks pack from position 0 across all LODs in d_aggregated. lod_view pointed result.data at h_aggregated + seg->data_segment_offset and the rebase loop subtracted the same value from h_offsets; for LOD >= 1 the data_segment_offset exceeds the actual cumulative LOD-0 bytes, so the resulting offsets underflow size_t and only happen to land on the right address via pointer-arith wrap-around (UB). Compute the per-LOD data base once before the rebase loop. In carry-over mode use seg->data_segment_offset (matches the bias kernel). In contiguous mode use h_offsets[batch_covering_offset + lv] (the cumulative prior-LOD bytes via the zero-sized per-LOD sentinel). Feed the same value into both the rebase subtraction and the lod_view data pointer so addresses match without UB. Extend test_cross_validate_lod to compare ALL levels byte-exact (was L0 only) and add a non-zero chunk-data assertion to test_batch_multiscale_unaligned_K. Closes acquire-project#135
Codecov Report✅ All modified and coverable lines are covered by tests. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #135
In contiguous mode (
page_size == 0) the GPU bias kernel is skipped, so chunks pack from position 0 across all LODs ind_aggregated. The pre-fix code did two things in lockstep that only worked due to undefined behavior:lod_viewpointedresult.dataath_aggregated + seg->data_segment_offset.data_segment_offsetfromh_offsets.For LOD >= 1 in contiguous mode,
data_segment_offsetis page-aligned past the cumulative LOD-0 bytes, so the rebased offsets underflowedsize_tand only landed on the correct address via 64-bit pointer-arithmetic wrap-around. This is technically UB and was an accident waiting to break under different memory layouts (guard pages, ASan, etc.).The fix computes a per-LOD
data_basearray up front and feeds the same value into both the rebase subtraction and thelod_viewdata pointer:page_size > 0):data_base[lv] = seg->data_segment_offset(matches the bias kernel).page_size == 0):data_base[lv] = h_offsets[batch_covering_offset + lv]— the cumulative actual bytes of prior LODs, read via the zero-sized per-LOD sentinel slot.This matches the CPU pipeline's view shape (
src/cpu/aggregate.c:330-338) and removes the underflow without changing the produced bytes.Test changes:
test_cross_validate_lodnow compares ALL levels byte-exact (previously L0 only). This exercises the contiguous-mode LOD >= 1 delivery path against the CPU reference.test_batch_multiscale_unaligned_Kadds anl1_nonzero_bytes > 0assertion on the delivered L1 shard buffers as a regression guard against future variants of this bug that lose chunk data entirely.Full
ctestpasses (48/48).