Replace ArrayDataBuilder with GenericListArray in ListArrayDecoder in json-reader#9497
Replace ArrayDataBuilder with GenericListArray in ListArrayDecoder in json-reader#9497liamzwbao wants to merge 3 commits intoapache:mainfrom
ArrayDataBuilder with GenericListArray in ListArrayDecoder in json-reader#9497Conversation
|
Any particular benchmark you want me to run? |
alamb
left a comment
There was a problem hiding this comment.
Thanks @liamzwbao -- this looks good to me -- I have just one question
| // Safety | ||
| // Validated lengths above | ||
| Ok(unsafe { data.build_unchecked() }) | ||
| let array = GenericListArray::<O>::try_new(field, offsets, values, nulls)?; |
There was a problem hiding this comment.
Does try_new validate the offsets ? That could be a significant performance hit
Basically as long as this doesn't do additional validation I think it looks good to me
There was a problem hiding this comment.
Looks like it will do validation, but I think it's cheap and the local benchmark does not cause a regression. BTW, we don't have an unchecked API for ListArray, the new API simply unwrap the try_new.
arrow-rs/arrow-array/src/array/list_array.rs
Lines 205 to 251 in e4b68e6
There was a problem hiding this comment.
The benchmarks on our runner seem to suggest that this PR is slower for some reason 🤔
| // Validated lengths above | ||
| Ok(unsafe { data.build_unchecked() }) | ||
| let array = GenericListArray::<O>::try_new(field, offsets, values, nulls)?; | ||
| Ok(array.into_data()) |
There was a problem hiding this comment.
I think this call simply creates an ArrayData (which is necessary given the API) so I am not sure it actually avoids any ArrayDatas
In order to avoid ArrayData we would probably need to change the signature of decode to return an ArrayRef directly (rather than ArrayData)
There was a problem hiding this comment.
Makes sense. I can change the signature of all the decoders if you think it's worth a try
This one should be good: https://github.com/apache/arrow-rs/blob/main/arrow-json/benches/json-reader.rs |
|
Hi @alamb, just noticed that we don't have benchmark for List yet. I will create one and get back to you later |
# Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. --> - Relates to #9497. # Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> Add benchmark for `ListArray` in `json_reader` to support the performance evaluation of #9497 # What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> - Benchmarks for decoding and serialize json list to `ListArray`. - Benchmarks for `ListArray` and `FixedSizeListArray` json writer # Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> Benchmarks only # Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. If there are any breaking changes to public APIs, please call them out. --> No
6155c94 to
7daefe6
Compare
|
Hi @alamb, this PR is ready for the benchmark, could you please run |
|
run benchmark json_reader |
This comment was marked as outdated.
This comment was marked as outdated.
|
🤖 |
|
🤖: Benchmark completed Details
|
|
run benchmark json_reader |
|
🤖 |
|
🤖: Benchmark completed Details
|
|
run benchmark json_reader |
|
🤖 Arrow criterion benchmark running (GKE) | trigger |
ArrayDataBuilder with GenericListArray in ListArrayDecoderArrayDataBuilder with GenericListArray in ListArrayDecoder in json-reader
alamb
left a comment
There was a problem hiding this comment.
Thank you for this PR @liamzwbao
the benchmarks seem to show that this PR is slower for some reason. Could you look into that and figure out why?
| // Safety | ||
| // Validated lengths above | ||
| Ok(unsafe { data.build_unchecked() }) | ||
| let array = GenericListArray::<O>::try_new(field, offsets, values, nulls)?; |
There was a problem hiding this comment.
The benchmarks on our runner seem to suggest that this PR is slower for some reason 🤔
|
🤖 Arrow criterion benchmark completed (GKE) | trigger Details
Resource Usagebase (merge-base)
branch
|
Which issue does this PR close?
ArrayDatawith direct Array construction, when possible #9298.Rationale for this change
While implementing
ListViewArrayDecoderin arrow-json, I noticed we could potentially retireArrayDataBuilderinsideListArrayDecoder. Therefore, I'd like to use a small PR here to make sure there's no regressionWhat changes are included in this PR?
Replace
ArrayDataBuilderwithGenericListArrayinListArrayDecoderAre these changes tested?
Covered by existing tests
Are there any user-facing changes?
No