Skip to content

Fix MutableArrayData::extend for ListView#9562

Open
vegarsti wants to merge 4 commits intoapache:mainfrom
vegarsti:list-view-mutable-array-data-bug-repro
Open

Fix MutableArrayData::extend for ListView#9562
vegarsti wants to merge 4 commits intoapache:mainfrom
vegarsti:list-view-mutable-array-data-bug-repro

Conversation

@vegarsti
Copy link
Contributor

@vegarsti vegarsti commented Mar 16, 2026

Which issue does this PR close?

The existing implementation in list_view.rs copies offsets and sizes but skips copying child data. The original code copies offsets and sizes into a new array that has an empty child values array, so the offsets may point into nothing if the child data doesn't exist anymore.

MutableArrayData's build_extend for ListView/LargeListView copies
offsets and sizes from the source array but never extends the child
data. This produces invalid arrays where offsets reference
nonexistent child elements, causing panics on access.
@github-actions github-actions bot added the arrow Changes to the arrow crate label Mar 16, 2026
Copy link
Contributor

@brancz brancz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this approach

// Copy the child range
let child_base = if child_max > child_min {
let base = mutable.child_data[0].len();
mutable.child_data[0].extend(index, child_min, child_max);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not slice?

Copy link
Contributor Author

@vegarsti vegarsti Mar 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you referring to extend_with_slice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, probably the slice operation!

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @vegarsti and @brancz -- I left some comments, let me know what you think

}

#[test]
fn test_list_view_mutable_array_data_extend() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can. you please also add tests with a ListView with overlapping and non contigous data?

for &offset in &offsets[start..start + len] {
offset_buffer.push(offset);
}
// MutableArrayData builds a new independent array, so we must copy the child values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My reading of this code is that it will copy some contiguous range of values

One tradeoff with this approach is that it will reduce the number of memcpys (which is good) at the expense of potentially copying a bunch of garbage

So for (an extreme) example, if the list view is coping two elements:

  • [0] (with offset 0, length 1)
  • [1] (offset 1000000, length 1)

I think this code is going to copy 1M entries when it could have just copied 2.

On the other hand, copying the entire chunk of underlying list values will preserve any aliasing that may exist (e.g. a ListView whose elements point at overlapping subsets) 🤔

What do you think about using a strategy to copy the lists one by one (that is to say, copy the lists one by one). The output ListView would have no "garbage" but would also be uncollected

I think this approach would be more aligned with what other types to. For example, StringView copies the elements one by one as well (though it is more complicated given it has multiple buffers):

move |mutable: &mut _MutableArrayData, _, start: usize, len: usize| {
let offset_buffer = &mut mutable.buffer1;
let values_buffer = &mut mutable.buffer2;
// this is safe due to how offset is built. See details on `get_last_offset`
let last_offset = unsafe { get_last_offset(offset_buffer) };
extend_offsets::<T>(offset_buffer, last_offset, &offsets[start..start + len + 1]);
// values
extend_offset_values::<T>(values_buffer, offsets, values, start, len);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very good point, that's not good. I think your strategy sounds better.

@alamb
Copy link
Contributor

alamb commented Mar 17, 2026

I see now there is an alternate proposal from @asubiotto here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MutableArrayData::extend does not copy child values for ListView arrays

3 participants