Skip to content

Decompress / decoding in parquet reader improvements#9577

Open
Dandandan wants to merge 9 commits intoapache:mainfrom
Dandandan:optimize-snappy-vlq-decoding
Open

Decompress / decoding in parquet reader improvements#9577
Dandandan wants to merge 9 commits intoapache:mainfrom
Dandandan:optimize-snappy-vlq-decoding

Conversation

@Dandandan
Copy link
Contributor

@Dandandan Dandandan commented Mar 19, 2026

Summary

Optimize dictionary-encoded column reading in the parquet reader, with focus on both primitive (Int32) and StringView types.

Changes

RLE decoder: branchless index clamping (rle.rs)

  • Replace if/else checked/unchecked branching with a single branchless .min(max_idx) clamp
  • Prevents UB on corrupt parquet files while avoiding per-element bounds checks
  • Add get_batch_direct method that exposes RLE vs bit-packed batches via callback

StringView dictionary decoding (byte_view_array.rs, dictionary_index.rs)

  • Fuse RLE decoding and view gathering: RLE runs use repeat_n to fill views directly, skipping the index buffer entirely
  • Pre-reserve output views capacity before the decode loop, eliminating per-chunk reallocation
  • Skip buffer management when all dictionary views are inlined (≤12 bytes)

ByteArray dictionary decoding (byte_array.rs)

  • Pre-reserve offsets capacity before the decode loop

Snappy / VLQ

  • Avoid unnecessary buffer zero-fill in Snappy decompression

VLQ

  • Reduce per-byte overhead in VLQ integer decoding

Benchmarks

Benchmark Before After Speedup
Int32Array dict, mandatory 60 µs 52 µs 13%
StringViewArray dict, mandatory 137 µs 75 µs 45%
StringArray dict, mandatory 537 µs 519 µs 3.5%

Test plan

  • All 84 arrow_reader integration tests pass (including bad_data tests for corrupt files)
  • All 14 RLE unit tests pass
  • Clippy clean with and without arrow feature

🤖 Generated with Claude Code

Avoid unnecessary buffer zero-fill in Snappy decompression by writing
directly into spare capacity, and reduce per-byte overhead in VLQ
integer decoding by reading directly from the buffer slice instead of
calling get_aligned for each byte.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added the parquet Changes to the parquet crate label Mar 19, 2026
@Dandandan Dandandan changed the title Optimize Snappy decompression and VLQ decoding in parquet reader [ClickBench] Optimize Snappy decompression and VLQ decoding in parquet reader Mar 19, 2026
@Dandandan
Copy link
Contributor Author

run benchmark arrow_reader_clickbench

@adriangbot
Copy link

🤖 Arrow criterion benchmark running (GKE) | trigger
Linux bench-c4088297533-449-hwhj7 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux
Comparing optimize-snappy-vlq-decoding (69764d3) to 3931179 (merge-base) diff
BENCH_NAME=arrow_reader_clickbench
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_reader_clickbench
BENCH_FILTER=
Results will be posted here when complete

@adriangbot
Copy link

Benchmark for this request failed.

Last 20 lines of output:

Click to expand
    Updating crates.io index
     Locking 417 packages to latest compatible versions
      Adding generic-array v0.14.7 (available: v0.14.9)
      Adding lz4_flex v0.12.1 (available: v0.13.0)
      Adding matchit v0.8.4 (available: v0.8.6)
      Adding rand v0.9.2 (available: v0.10.0)
rustc 1.94.0 (4a4ef493e 2026-03-02)
69764d3ae7c7b493e9aa255be21e968f62042c0f
393117979882e97a15125edd142c70a5e2c16386
Looking for ClickBench files starting in current_dir and all parent directories: "/workspace/arrow-rs-base/parquet"
    Finished `bench` profile [optimized] target(s) in 0.14s
     Running benches/arrow_reader_clickbench.rs (target/release/deps/arrow_reader_clickbench-783851a0a179a92e)
Could not find hits_1.parquet in directory or parents: "/workspace/arrow-rs-base/parquet". Download it via

wget --continue https://datasets.clickhouse.com/hits_compatible/athena_partitioned/hits_1.parquet

thread 'main' (15416) panicked at parquet/benches/arrow_reader_clickbench.rs:618:9:
Stopping
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
error: bench failed, to rerun pass `-p parquet --bench arrow_reader_clickbench`

@Dandandan
Copy link
Contributor Author

run benchmark arrow_reader_clickbench

@adriangbot
Copy link

🤖 Arrow criterion benchmark running (GKE) | trigger
Linux bench-c4088349681-450-sflql 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux
Comparing optimize-snappy-vlq-decoding (04fc413) to 88422cb (merge-base) diff
BENCH_NAME=arrow_reader_clickbench
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_reader_clickbench
BENCH_FILTER=
Results will be posted here when complete

@Dandandan Dandandan changed the title [ClickBench] Optimize Snappy decompression and VLQ decoding in parquet reader [ClickBench] Avoid zeroing in Snappy and micro-optimize VLQ decoding in parquet reader Mar 19, 2026
@adriangbot
Copy link

🤖 Arrow criterion benchmark completed (GKE) | trigger

Details

group                                             main                                   optimize-snappy-vlq-decoding
-----                                             ----                                   ----------------------------
arrow_reader_clickbench/async/Q1                  1.00   1082.9±5.64µs        ? ?/sec    1.01   1092.6±4.67µs        ? ?/sec
arrow_reader_clickbench/async/Q10                 1.03      6.9±0.05ms        ? ?/sec    1.00      6.7±0.03ms        ? ?/sec
arrow_reader_clickbench/async/Q11                 1.02      8.0±0.11ms        ? ?/sec    1.00      7.8±0.04ms        ? ?/sec
arrow_reader_clickbench/async/Q12                 1.01     14.6±0.07ms        ? ?/sec    1.00     14.4±0.05ms        ? ?/sec
arrow_reader_clickbench/async/Q13                 1.01     17.3±0.14ms        ? ?/sec    1.00     17.1±0.13ms        ? ?/sec
arrow_reader_clickbench/async/Q14                 1.00     16.2±0.09ms        ? ?/sec    1.00     16.2±0.05ms        ? ?/sec
arrow_reader_clickbench/async/Q19                 1.01      3.1±0.03ms        ? ?/sec    1.00      3.0±0.04ms        ? ?/sec
arrow_reader_clickbench/async/Q20                 1.26     91.0±0.53ms        ? ?/sec    1.00     72.1±0.16ms        ? ?/sec
arrow_reader_clickbench/async/Q21                 1.30    103.8±1.45ms        ? ?/sec    1.00     79.7±0.38ms        ? ?/sec
arrow_reader_clickbench/async/Q22                 1.17    132.0±7.50ms        ? ?/sec    1.00    112.6±1.23ms        ? ?/sec
arrow_reader_clickbench/async/Q23                 1.04    251.3±0.67ms        ? ?/sec    1.00    242.1±1.09ms        ? ?/sec
arrow_reader_clickbench/async/Q24                 1.03     20.1±0.16ms        ? ?/sec    1.00     19.6±0.07ms        ? ?/sec
arrow_reader_clickbench/async/Q27                 1.03     59.3±0.43ms        ? ?/sec    1.00     57.6±0.22ms        ? ?/sec
arrow_reader_clickbench/async/Q28                 1.05     60.7±7.59ms        ? ?/sec    1.00     57.6±0.21ms        ? ?/sec
arrow_reader_clickbench/async/Q30                 1.02     19.0±0.10ms        ? ?/sec    1.00     18.6±0.04ms        ? ?/sec
arrow_reader_clickbench/async/Q36                 1.02     15.9±0.32ms        ? ?/sec    1.00     15.6±0.31ms        ? ?/sec
arrow_reader_clickbench/async/Q37                 1.00      5.5±0.03ms        ? ?/sec    1.02      5.6±0.03ms        ? ?/sec
arrow_reader_clickbench/async/Q38                 1.02     14.2±0.31ms        ? ?/sec    1.00     14.0±0.33ms        ? ?/sec
arrow_reader_clickbench/async/Q39                 1.00     25.6±0.54ms        ? ?/sec    1.00     25.6±0.49ms        ? ?/sec
arrow_reader_clickbench/async/Q40                 1.03      5.9±0.06ms        ? ?/sec    1.00      5.7±0.04ms        ? ?/sec
arrow_reader_clickbench/async/Q41                 1.03      5.1±0.04ms        ? ?/sec    1.00      5.0±0.04ms        ? ?/sec
arrow_reader_clickbench/async/Q42                 1.02      3.6±0.02ms        ? ?/sec    1.00      3.5±0.02ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q1     1.00   1065.8±4.03µs        ? ?/sec    1.01   1072.6±4.96µs        ? ?/sec
arrow_reader_clickbench/async_object_store/Q10    1.02      6.7±0.07ms        ? ?/sec    1.00      6.6±0.05ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q11    1.01      7.7±0.06ms        ? ?/sec    1.00      7.6±0.03ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q12    1.03     14.8±0.09ms        ? ?/sec    1.00     14.4±0.06ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q13    1.04     17.6±0.15ms        ? ?/sec    1.00     16.9±0.08ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q14    1.03     16.2±0.10ms        ? ?/sec    1.00     15.8±0.04ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q19    1.03      3.0±0.03ms        ? ?/sec    1.00      2.9±0.03ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q20    1.02     72.4±0.60ms        ? ?/sec    1.00     71.1±0.45ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q21    1.02     81.0±0.75ms        ? ?/sec    1.00     79.4±0.46ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q22    1.03     99.8±0.89ms        ? ?/sec    1.00     96.8±0.99ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q23    1.05    238.2±1.37ms        ? ?/sec    1.00    227.9±0.35ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q24    1.02     19.6±0.13ms        ? ?/sec    1.00     19.2±0.08ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q27    1.03     57.6±0.71ms        ? ?/sec    1.00     56.1±0.47ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q28    1.03     57.9±0.60ms        ? ?/sec    1.00     56.4±0.43ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q30    1.00     18.7±0.08ms        ? ?/sec    1.00     18.7±1.43ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q36    1.02     15.3±0.26ms        ? ?/sec    1.00     15.0±0.28ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q37    1.01      5.5±0.04ms        ? ?/sec    1.00      5.4±0.02ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q38    1.01     13.3±0.30ms        ? ?/sec    1.00     13.2±0.23ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q39    1.01     23.9±0.45ms        ? ?/sec    1.00     23.7±0.47ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q40    1.00      5.6±0.05ms        ? ?/sec    1.01      5.7±0.05ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q41    1.00      4.9±0.04ms        ? ?/sec    1.00      4.9±0.03ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q42    1.01      3.5±0.02ms        ? ?/sec    1.00      3.4±0.02ms        ? ?/sec
arrow_reader_clickbench/sync/Q1                   1.00    869.3±4.29µs        ? ?/sec    1.00    872.7±2.77µs        ? ?/sec
arrow_reader_clickbench/sync/Q10                  1.02      5.2±0.06ms        ? ?/sec    1.00      5.1±0.02ms        ? ?/sec
arrow_reader_clickbench/sync/Q11                  1.02      6.3±0.07ms        ? ?/sec    1.00      6.1±0.03ms        ? ?/sec
arrow_reader_clickbench/sync/Q12                  1.03     22.2±0.29ms        ? ?/sec    1.00     21.5±0.07ms        ? ?/sec
arrow_reader_clickbench/sync/Q13                  1.03     25.0±0.07ms        ? ?/sec    1.00     24.2±0.06ms        ? ?/sec
arrow_reader_clickbench/sync/Q14                  1.02     23.6±0.09ms        ? ?/sec    1.00     23.0±0.04ms        ? ?/sec
arrow_reader_clickbench/sync/Q19                  1.00      2.7±0.02ms        ? ?/sec    1.00      2.7±0.03ms        ? ?/sec
arrow_reader_clickbench/sync/Q20                  1.03    124.7±0.31ms        ? ?/sec    1.00    120.6±0.19ms        ? ?/sec
arrow_reader_clickbench/sync/Q21                  1.03     99.5±0.22ms        ? ?/sec    1.00     96.9±0.45ms        ? ?/sec
arrow_reader_clickbench/sync/Q22                  1.03    145.4±0.32ms        ? ?/sec    1.00    140.7±0.51ms        ? ?/sec
arrow_reader_clickbench/sync/Q23                  1.04   282.9±13.21ms        ? ?/sec    1.00   271.3±12.13ms        ? ?/sec
arrow_reader_clickbench/sync/Q24                  1.02     27.8±0.05ms        ? ?/sec    1.00     27.2±0.06ms        ? ?/sec
arrow_reader_clickbench/sync/Q27                  1.03    110.5±0.28ms        ? ?/sec    1.00    106.8±0.11ms        ? ?/sec
arrow_reader_clickbench/sync/Q28                  1.03    107.6±0.14ms        ? ?/sec    1.00    104.2±0.13ms        ? ?/sec
arrow_reader_clickbench/sync/Q30                  1.02     19.2±0.04ms        ? ?/sec    1.00     18.9±0.03ms        ? ?/sec
arrow_reader_clickbench/sync/Q36                  1.01     22.6±0.06ms        ? ?/sec    1.00     22.3±0.06ms        ? ?/sec
arrow_reader_clickbench/sync/Q37                  1.00      6.9±0.01ms        ? ?/sec    1.00      6.9±0.01ms        ? ?/sec
arrow_reader_clickbench/sync/Q38                  1.01     11.6±0.03ms        ? ?/sec    1.00     11.5±0.01ms        ? ?/sec
arrow_reader_clickbench/sync/Q39                  1.02     21.4±0.04ms        ? ?/sec    1.00     21.0±0.04ms        ? ?/sec
arrow_reader_clickbench/sync/Q40                  1.02      5.3±0.02ms        ? ?/sec    1.00      5.2±0.02ms        ? ?/sec
arrow_reader_clickbench/sync/Q41                  1.01      5.7±0.03ms        ? ?/sec    1.00      5.7±0.02ms        ? ?/sec
arrow_reader_clickbench/sync/Q42                  1.02      4.4±0.02ms        ? ?/sec    1.00      4.3±0.02ms        ? ?/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 788.3s
Peak memory 3.1 GiB
Avg memory 2.9 GiB
CPU user 700.7s
CPU sys 87.5s
Disk read 0 B
Disk write 2.0 GiB

branch

Metric Value
Wall time 777.8s
Peak memory 3.2 GiB
Avg memory 3.1 GiB
CPU user 707.3s
CPU sys 70.4s
Disk read 0 B
Disk write 171.2 MiB

@Dandandan Dandandan marked this pull request as ready for review March 19, 2026 08:09
@Dandandan
Copy link
Contributor Author

I guess 1-3% sounds about right

@Dandandan
Copy link
Contributor Author

run benchmark arrow_reader_clickbench

@adriangbot
Copy link

🤖 Arrow criterion benchmark running (GKE) | trigger
Linux bench-c4089201972-452-rnvpm 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux
Comparing optimize-snappy-vlq-decoding (04fc413) to 88422cb (merge-base) diff
BENCH_NAME=arrow_reader_clickbench
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_reader_clickbench
BENCH_FILTER=
Results will be posted here when complete

@adriangbot
Copy link

🤖 Arrow criterion benchmark completed (GKE) | trigger

Details

group                                             main                                   optimize-snappy-vlq-decoding
-----                                             ----                                   ----------------------------
arrow_reader_clickbench/async/Q1                  1.00   1088.5±3.42µs        ? ?/sec    1.00   1089.2±5.70µs        ? ?/sec
arrow_reader_clickbench/async/Q10                 1.01      6.7±0.06ms        ? ?/sec    1.00      6.6±0.04ms        ? ?/sec
arrow_reader_clickbench/async/Q11                 1.02      7.8±0.06ms        ? ?/sec    1.00      7.6±0.04ms        ? ?/sec
arrow_reader_clickbench/async/Q12                 1.02     14.4±0.09ms        ? ?/sec    1.00     14.2±0.05ms        ? ?/sec
arrow_reader_clickbench/async/Q13                 1.03     17.2±0.16ms        ? ?/sec    1.00     16.7±0.07ms        ? ?/sec
arrow_reader_clickbench/async/Q14                 1.02     16.2±0.13ms        ? ?/sec    1.00     15.8±0.07ms        ? ?/sec
arrow_reader_clickbench/async/Q19                 1.00      3.0±0.03ms        ? ?/sec    1.00      3.0±0.02ms        ? ?/sec
arrow_reader_clickbench/async/Q20                 1.10     77.6±4.20ms        ? ?/sec    1.00     70.9±0.18ms        ? ?/sec
arrow_reader_clickbench/async/Q21                 1.09     86.9±0.52ms        ? ?/sec    1.00     79.5±0.20ms        ? ?/sec
arrow_reader_clickbench/async/Q22                 1.15    128.4±5.08ms        ? ?/sec    1.00    111.9±2.11ms        ? ?/sec
arrow_reader_clickbench/async/Q23                 1.04    248.1±1.14ms        ? ?/sec    1.00    237.9±1.19ms        ? ?/sec
arrow_reader_clickbench/async/Q24                 1.02     19.5±0.15ms        ? ?/sec    1.00     19.1±0.08ms        ? ?/sec
arrow_reader_clickbench/async/Q27                 1.05     59.1±0.39ms        ? ?/sec    1.00     56.3±0.19ms        ? ?/sec
arrow_reader_clickbench/async/Q28                 1.04     58.0±0.46ms        ? ?/sec    1.00     55.9±0.12ms        ? ?/sec
arrow_reader_clickbench/async/Q30                 1.01     18.6±0.09ms        ? ?/sec    1.00     18.4±0.04ms        ? ?/sec
arrow_reader_clickbench/async/Q36                 1.03     15.1±0.23ms        ? ?/sec    1.00     14.6±0.14ms        ? ?/sec
arrow_reader_clickbench/async/Q37                 1.00      5.4±0.02ms        ? ?/sec    1.00      5.4±0.02ms        ? ?/sec
arrow_reader_clickbench/async/Q38                 1.04     13.4±0.24ms        ? ?/sec    1.00     12.8±0.13ms        ? ?/sec
arrow_reader_clickbench/async/Q39                 1.06     24.7±0.48ms        ? ?/sec    1.00     23.3±0.24ms        ? ?/sec
arrow_reader_clickbench/async/Q40                 1.02      5.8±0.05ms        ? ?/sec    1.00      5.7±0.03ms        ? ?/sec
arrow_reader_clickbench/async/Q41                 1.01      5.0±0.03ms        ? ?/sec    1.00      5.0±0.02ms        ? ?/sec
arrow_reader_clickbench/async/Q42                 1.02      3.6±0.03ms        ? ?/sec    1.00      3.5±0.01ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q1     1.00   1075.7±5.32µs        ? ?/sec    1.00   1071.2±5.62µs        ? ?/sec
arrow_reader_clickbench/async_object_store/Q10    1.01      6.6±0.03ms        ? ?/sec    1.00      6.5±0.04ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q11    1.01      7.5±0.06ms        ? ?/sec    1.00      7.4±0.04ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q12    1.01     14.2±0.04ms        ? ?/sec    1.00     14.1±0.06ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q13    1.02     16.8±0.08ms        ? ?/sec    1.00     16.5±0.05ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q14    1.01     15.7±0.07ms        ? ?/sec    1.00     15.6±0.06ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q19    1.01      2.9±0.03ms        ? ?/sec    1.00      2.9±0.02ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q20    1.03     72.3±0.69ms        ? ?/sec    1.00     69.9±0.22ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q21    1.03     80.8±0.61ms        ? ?/sec    1.00     78.4±0.25ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q22    1.02     98.8±0.62ms        ? ?/sec    1.00     97.1±3.19ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q23    1.08    228.8±6.75ms        ? ?/sec    1.00    212.2±2.09ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q24    1.03     19.3±0.14ms        ? ?/sec    1.00     18.7±0.07ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q27    1.04     57.0±0.73ms        ? ?/sec    1.00     55.1±0.24ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q28    1.04     57.2±0.45ms        ? ?/sec    1.00     54.8±0.13ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q30    1.01     18.2±0.06ms        ? ?/sec    1.00     18.0±0.04ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q36    1.03     14.3±0.18ms        ? ?/sec    1.00     13.8±0.17ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q37    1.01      5.4±0.01ms        ? ?/sec    1.00      5.3±0.02ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q38    1.08     13.0±0.35ms        ? ?/sec    1.00     12.1±0.14ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q39    1.04     22.9±0.39ms        ? ?/sec    1.00     22.0±0.22ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q40    1.01      5.5±0.04ms        ? ?/sec    1.00      5.4±0.03ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q41    1.00      4.8±0.03ms        ? ?/sec    1.00      4.8±0.02ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q42    1.00      3.4±0.01ms        ? ?/sec    1.00      3.4±0.02ms        ? ?/sec
arrow_reader_clickbench/sync/Q1                   1.00    867.2±2.15µs        ? ?/sec    1.01    873.7±5.99µs        ? ?/sec
arrow_reader_clickbench/sync/Q10                  1.01      5.1±0.03ms        ? ?/sec    1.00      5.1±0.02ms        ? ?/sec
arrow_reader_clickbench/sync/Q11                  1.00      6.1±0.02ms        ? ?/sec    1.00      6.1±0.03ms        ? ?/sec
arrow_reader_clickbench/sync/Q12                  1.03     22.0±0.64ms        ? ?/sec    1.00     21.3±0.06ms        ? ?/sec
arrow_reader_clickbench/sync/Q13                  1.18     28.2±0.84ms        ? ?/sec    1.00     23.9±0.07ms        ? ?/sec
arrow_reader_clickbench/sync/Q14                  1.02     23.0±0.17ms        ? ?/sec    1.00     22.7±0.06ms        ? ?/sec
arrow_reader_clickbench/sync/Q19                  1.02      2.7±0.03ms        ? ?/sec    1.00      2.7±0.03ms        ? ?/sec
arrow_reader_clickbench/sync/Q20                  1.03    124.3±0.44ms        ? ?/sec    1.00    120.2±0.26ms        ? ?/sec
arrow_reader_clickbench/sync/Q21                  1.04     98.9±0.27ms        ? ?/sec    1.00     95.5±0.22ms        ? ?/sec
arrow_reader_clickbench/sync/Q22                  1.03    145.7±1.70ms        ? ?/sec    1.00    141.1±0.50ms        ? ?/sec
arrow_reader_clickbench/sync/Q23                  1.06   282.4±13.97ms        ? ?/sec    1.00   267.3±12.66ms        ? ?/sec
arrow_reader_clickbench/sync/Q24                  1.02     27.2±0.32ms        ? ?/sec    1.00     26.8±0.06ms        ? ?/sec
arrow_reader_clickbench/sync/Q27                  1.04    109.2±0.97ms        ? ?/sec    1.00    105.1±0.28ms        ? ?/sec
arrow_reader_clickbench/sync/Q28                  1.04    105.5±0.60ms        ? ?/sec    1.00    101.9±0.39ms        ? ?/sec
arrow_reader_clickbench/sync/Q30                  1.01     18.9±0.16ms        ? ?/sec    1.00     18.7±0.10ms        ? ?/sec
arrow_reader_clickbench/sync/Q36                  1.01     22.0±0.05ms        ? ?/sec    1.00     21.8±0.08ms        ? ?/sec
arrow_reader_clickbench/sync/Q37                  1.00      6.9±0.03ms        ? ?/sec    1.01      6.9±0.01ms        ? ?/sec
arrow_reader_clickbench/sync/Q38                  1.02     11.3±0.07ms        ? ?/sec    1.00     11.1±0.05ms        ? ?/sec
arrow_reader_clickbench/sync/Q39                  1.03     20.7±0.13ms        ? ?/sec    1.00     20.2±0.07ms        ? ?/sec
arrow_reader_clickbench/sync/Q40                  1.02      5.2±0.06ms        ? ?/sec    1.00      5.1±0.02ms        ? ?/sec
arrow_reader_clickbench/sync/Q41                  1.01      5.7±0.05ms        ? ?/sec    1.00      5.6±0.01ms        ? ?/sec
arrow_reader_clickbench/sync/Q42                  1.01      4.3±0.02ms        ? ?/sec    1.00      4.3±0.01ms        ? ?/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 787.3s
Peak memory 3.1 GiB
Avg memory 2.9 GiB
CPU user 710.7s
CPU sys 76.4s
Disk read 0 B
Disk write 2.1 GiB

branch

Metric Value
Wall time 775.0s
Peak memory 3.2 GiB
Avg memory 3.1 GiB
CPU user 714.8s
CPU sys 58.4s
Disk read 0 B
Disk write 172.2 MiB

@Dandandan
Copy link
Contributor Author

Nice seems some low single digit improvement across the board (I think mostly from the non-zeroing)

When bit_width guarantees all possible indices fit within the dictionary,
use unchecked indexing to allow LLVM to unroll the dict gather loop 4x
with paired loads/stores instead of scalar with per-element bounds checks.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Dandandan
Copy link
Contributor Author

run benchmark arrow_reader arrow_reader_clickbench

@Dandandan Dandandan changed the title [ClickBench] Avoid zeroing in Snappy and micro-optimize VLQ decoding in parquet reader [ClickBench] Avoid zeroing in Snappy and micro-optimize parquet reader Mar 19, 2026
@adriangbot
Copy link

🤖 Arrow criterion benchmark running (GKE) | trigger
Linux bench-c4089547785-456-w89gr 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux
Comparing optimize-snappy-vlq-decoding (9b7aeea) to 88422cb (merge-base) diff
BENCH_NAME=arrow_reader
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_reader
BENCH_FILTER=
Results will be posted here when complete

@adriangbot
Copy link

🤖 Arrow criterion benchmark running (GKE) | trigger
Linux bench-c4089547785-457-gkxvl 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux
Comparing optimize-snappy-vlq-decoding (9b7aeea) to 88422cb (merge-base) diff
BENCH_NAME=arrow_reader_clickbench
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_reader_clickbench
BENCH_FILTER=
Results will be posted here when complete

@adriangbot
Copy link

🤖 Arrow criterion benchmark completed (GKE) | trigger

Details

group                                             main                                   optimize-snappy-vlq-decoding
-----                                             ----                                   ----------------------------
arrow_reader_clickbench/async/Q1                  1.00   1082.1±4.16µs        ? ?/sec    1.01   1092.5±4.20µs        ? ?/sec
arrow_reader_clickbench/async/Q10                 1.01      6.8±0.05ms        ? ?/sec    1.00      6.7±0.05ms        ? ?/sec
arrow_reader_clickbench/async/Q11                 1.01      7.9±0.08ms        ? ?/sec    1.00      7.8±0.03ms        ? ?/sec
arrow_reader_clickbench/async/Q12                 1.01     14.7±0.05ms        ? ?/sec    1.00     14.5±0.08ms        ? ?/sec
arrow_reader_clickbench/async/Q13                 1.03     17.5±0.09ms        ? ?/sec    1.00     17.0±0.10ms        ? ?/sec
arrow_reader_clickbench/async/Q14                 1.01     16.2±0.06ms        ? ?/sec    1.00     16.0±0.07ms        ? ?/sec
arrow_reader_clickbench/async/Q19                 1.03      3.1±0.03ms        ? ?/sec    1.00      3.0±0.03ms        ? ?/sec
arrow_reader_clickbench/async/Q20                 1.09     77.5±4.28ms        ? ?/sec    1.00     70.9±0.19ms        ? ?/sec
arrow_reader_clickbench/async/Q21                 1.02     97.5±0.38ms        ? ?/sec    1.00     96.0±9.79ms        ? ?/sec
arrow_reader_clickbench/async/Q22                 1.00    130.2±7.92ms        ? ?/sec    1.02    132.3±6.95ms        ? ?/sec
arrow_reader_clickbench/async/Q23                 1.03    251.8±6.97ms        ? ?/sec    1.00    243.3±6.11ms        ? ?/sec
arrow_reader_clickbench/async/Q24                 1.01     19.7±0.12ms        ? ?/sec    1.00     19.5±0.08ms        ? ?/sec
arrow_reader_clickbench/async/Q27                 1.04     58.6±0.33ms        ? ?/sec    1.00     56.1±0.21ms        ? ?/sec
arrow_reader_clickbench/async/Q28                 1.02     58.1±0.28ms        ? ?/sec    1.00     56.7±0.16ms        ? ?/sec
arrow_reader_clickbench/async/Q30                 1.01     18.7±0.07ms        ? ?/sec    1.00     18.5±0.06ms        ? ?/sec
arrow_reader_clickbench/async/Q36                 1.05     15.4±0.22ms        ? ?/sec    1.00     14.7±0.09ms        ? ?/sec
arrow_reader_clickbench/async/Q37                 1.01      5.5±0.03ms        ? ?/sec    1.00      5.4±0.03ms        ? ?/sec
arrow_reader_clickbench/async/Q38                 1.07     13.9±0.20ms        ? ?/sec    1.00     13.0±0.08ms        ? ?/sec
arrow_reader_clickbench/async/Q39                 1.06     24.8±0.38ms        ? ?/sec    1.00     23.3±0.15ms        ? ?/sec
arrow_reader_clickbench/async/Q40                 1.06      6.0±0.08ms        ? ?/sec    1.00      5.6±0.03ms        ? ?/sec
arrow_reader_clickbench/async/Q41                 1.04      5.2±0.05ms        ? ?/sec    1.00      5.0±0.03ms        ? ?/sec
arrow_reader_clickbench/async/Q42                 1.02      3.6±0.02ms        ? ?/sec    1.00      3.5±0.02ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q1     1.00   1065.6±4.89µs        ? ?/sec    1.00   1068.7±2.80µs        ? ?/sec
arrow_reader_clickbench/async_object_store/Q10    1.01      6.7±0.05ms        ? ?/sec    1.00      6.7±0.19ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q11    1.03      7.9±0.13ms        ? ?/sec    1.00      7.7±0.04ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q12    1.02     14.7±0.09ms        ? ?/sec    1.00     14.4±0.14ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q13    1.03     17.4±0.11ms        ? ?/sec    1.00     17.0±0.18ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q14    1.01     16.2±0.07ms        ? ?/sec    1.00     15.9±0.17ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q19    1.04      3.0±0.03ms        ? ?/sec    1.00      2.9±0.02ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q20    1.02     71.9±0.54ms        ? ?/sec    1.00     70.7±4.55ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q21    1.03     81.0±0.50ms        ? ?/sec    1.00     78.2±0.19ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q22    1.04     99.8±0.66ms        ? ?/sec    1.00     95.7±0.41ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q23    1.09    236.4±4.30ms        ? ?/sec    1.00    216.1±1.16ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q24    1.02     19.6±0.13ms        ? ?/sec    1.00     19.3±0.16ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q27    1.03     57.3±0.54ms        ? ?/sec    1.00     55.4±0.22ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q28    1.02     57.3±0.38ms        ? ?/sec    1.00     55.9±0.27ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q30    1.01     18.5±0.09ms        ? ?/sec    1.00     18.3±0.19ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q36    1.03     14.8±0.16ms        ? ?/sec    1.00     14.3±0.09ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q37    1.00      5.4±0.03ms        ? ?/sec    1.00      5.4±0.03ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q38    1.03     13.0±0.13ms        ? ?/sec    1.00     12.7±0.06ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q39    1.04     23.5±0.31ms        ? ?/sec    1.00     22.6±0.10ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q40    1.02      5.7±0.06ms        ? ?/sec    1.00      5.5±0.07ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q41    1.02      4.9±0.04ms        ? ?/sec    1.00      4.8±0.07ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q42    1.01      3.5±0.02ms        ? ?/sec    1.00      3.4±0.03ms        ? ?/sec
arrow_reader_clickbench/sync/Q1                   1.00   870.9±12.89µs        ? ?/sec    1.01    878.0±2.22µs        ? ?/sec
arrow_reader_clickbench/sync/Q10                  1.02      5.2±0.03ms        ? ?/sec    1.00      5.1±0.02ms        ? ?/sec
arrow_reader_clickbench/sync/Q11                  1.00      6.2±0.03ms        ? ?/sec    1.00      6.2±0.04ms        ? ?/sec
arrow_reader_clickbench/sync/Q12                  1.04     22.3±0.68ms        ? ?/sec    1.00     21.5±0.06ms        ? ?/sec
arrow_reader_clickbench/sync/Q13                  1.21     29.1±0.84ms        ? ?/sec    1.00     24.1±0.06ms        ? ?/sec
arrow_reader_clickbench/sync/Q14                  1.02     23.4±0.14ms        ? ?/sec    1.00     22.9±0.05ms        ? ?/sec
arrow_reader_clickbench/sync/Q19                  1.04      2.7±0.03ms        ? ?/sec    1.00      2.7±0.03ms        ? ?/sec
arrow_reader_clickbench/sync/Q20                  1.04    125.7±0.32ms        ? ?/sec    1.00    120.7±0.28ms        ? ?/sec
arrow_reader_clickbench/sync/Q21                  1.04     99.5±0.15ms        ? ?/sec    1.00     95.8±0.21ms        ? ?/sec
arrow_reader_clickbench/sync/Q22                  1.01    146.4±0.37ms        ? ?/sec    1.00    144.7±0.51ms        ? ?/sec
arrow_reader_clickbench/sync/Q23                  1.02   283.3±14.55ms        ? ?/sec    1.00   277.2±14.93ms        ? ?/sec
arrow_reader_clickbench/sync/Q24                  1.01     27.5±0.07ms        ? ?/sec    1.00     27.2±0.08ms        ? ?/sec
arrow_reader_clickbench/sync/Q27                  1.05    110.0±0.82ms        ? ?/sec    1.00    104.8±0.16ms        ? ?/sec
arrow_reader_clickbench/sync/Q28                  1.03    105.9±0.21ms        ? ?/sec    1.00    102.5±0.64ms        ? ?/sec
arrow_reader_clickbench/sync/Q30                  1.02     19.1±0.04ms        ? ?/sec    1.00     18.6±0.05ms        ? ?/sec
arrow_reader_clickbench/sync/Q36                  1.03     22.5±0.08ms        ? ?/sec    1.00     22.0±0.05ms        ? ?/sec
arrow_reader_clickbench/sync/Q37                  1.00      6.9±0.02ms        ? ?/sec    1.00      6.9±0.01ms        ? ?/sec
arrow_reader_clickbench/sync/Q38                  1.02     11.6±0.04ms        ? ?/sec    1.00     11.3±0.03ms        ? ?/sec
arrow_reader_clickbench/sync/Q39                  1.03     21.1±0.05ms        ? ?/sec    1.00     20.6±0.10ms        ? ?/sec
arrow_reader_clickbench/sync/Q40                  1.03      5.3±0.02ms        ? ?/sec    1.00      5.1±0.03ms        ? ?/sec
arrow_reader_clickbench/sync/Q41                  1.01      5.7±0.02ms        ? ?/sec    1.00      5.6±0.04ms        ? ?/sec
arrow_reader_clickbench/sync/Q42                  1.01      4.4±0.02ms        ? ?/sec    1.00      4.3±0.03ms        ? ?/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 792.9s
Peak memory 3.1 GiB
Avg memory 2.9 GiB
CPU user 706.9s
CPU sys 85.7s
Disk read 0 B
Disk write 2.0 GiB

branch

Metric Value
Wall time 787.9s
Peak memory 3.2 GiB
Avg memory 3.1 GiB
CPU user 716.6s
CPU sys 71.2s
Disk read 0 B
Disk write 171.4 MiB

When bit_width guarantees all possible indices fit within the dictionary,
use unchecked access to eliminate per-element bounds checks. Also skip
buffer management when all dictionary views are inlined (<=12 bytes).

Generates a clean 8-instruction gather loop for the common case
(all_indices_valid + base_buffer_idx=0) and a branchless 14-instruction
loop for the non-zero buffer offset case.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Dandandan
Copy link
Contributor Author

run benchmark arrow_reader arrow_reader_clickbench

@adriangbot
Copy link

🤖 Arrow criterion benchmark running (GKE) | trigger
Linux bench-c4089945834-458-ppv47 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux
Comparing optimize-snappy-vlq-decoding (d9398fc) to 88422cb (merge-base) diff
BENCH_NAME=arrow_reader
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_reader
BENCH_FILTER=
Results will be posted here when complete

@adriangbot
Copy link

🤖 Arrow criterion benchmark running (GKE) | trigger
Linux bench-c4089945834-459-4j7pj 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux
Comparing optimize-snappy-vlq-decoding (d9398fc) to 88422cb (merge-base) diff
BENCH_NAME=arrow_reader_clickbench
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_reader_clickbench
BENCH_FILTER=
Results will be posted here when complete

@adriangbot
Copy link

🤖 Arrow criterion benchmark completed (GKE) | trigger

Details

group                                             main                                   optimize-snappy-vlq-decoding
-----                                             ----                                   ----------------------------
arrow_reader_clickbench/async/Q1                  1.00   1089.6±4.45µs        ? ?/sec    1.00   1090.8±5.80µs        ? ?/sec
arrow_reader_clickbench/async/Q10                 1.00      7.0±0.28ms        ? ?/sec    1.00      6.9±0.28ms        ? ?/sec
arrow_reader_clickbench/async/Q11                 1.01      8.0±0.31ms        ? ?/sec    1.00      7.9±0.32ms        ? ?/sec
arrow_reader_clickbench/async/Q12                 1.00     14.8±0.34ms        ? ?/sec    1.28     19.0±0.62ms        ? ?/sec
arrow_reader_clickbench/async/Q13                 1.00     17.6±0.37ms        ? ?/sec    1.33     23.4±0.22ms        ? ?/sec
arrow_reader_clickbench/async/Q14                 1.00     16.7±0.12ms        ? ?/sec    1.26     21.0±0.66ms        ? ?/sec
arrow_reader_clickbench/async/Q19                 1.02      3.1±0.06ms        ? ?/sec    1.00      3.0±0.07ms        ? ?/sec
arrow_reader_clickbench/async/Q20                 1.00     92.0±1.20ms        ? ?/sec    1.14    104.7±2.69ms        ? ?/sec
arrow_reader_clickbench/async/Q21                 1.00    102.5±9.75ms        ? ?/sec    1.12    114.4±2.01ms        ? ?/sec
arrow_reader_clickbench/async/Q22                 1.00   132.1±11.04ms        ? ?/sec    1.05    138.2±7.01ms        ? ?/sec
arrow_reader_clickbench/async/Q23                 1.01    246.4±3.12ms        ? ?/sec    1.00    243.6±3.06ms        ? ?/sec
arrow_reader_clickbench/async/Q24                 1.02     19.9±0.53ms        ? ?/sec    1.00     19.6±0.29ms        ? ?/sec
arrow_reader_clickbench/async/Q27                 1.03     58.5±0.76ms        ? ?/sec    1.00     56.8±0.19ms        ? ?/sec
arrow_reader_clickbench/async/Q28                 1.04     59.7±0.70ms        ? ?/sec    1.00     57.6±0.70ms        ? ?/sec
arrow_reader_clickbench/async/Q30                 1.02     19.0±0.11ms        ? ?/sec    1.00     18.5±0.17ms        ? ?/sec
arrow_reader_clickbench/async/Q36                 1.06     15.9±0.47ms        ? ?/sec    1.00     15.0±0.51ms        ? ?/sec
arrow_reader_clickbench/async/Q37                 1.00      5.5±0.12ms        ? ?/sec    1.00      5.5±0.13ms        ? ?/sec
arrow_reader_clickbench/async/Q38                 1.05     13.9±0.49ms        ? ?/sec    1.00     13.2±0.26ms        ? ?/sec
arrow_reader_clickbench/async/Q39                 1.07     25.8±0.89ms        ? ?/sec    1.00     24.2±0.20ms        ? ?/sec
arrow_reader_clickbench/async/Q40                 1.00      5.8±0.09ms        ? ?/sec    1.02      5.9±0.15ms        ? ?/sec
arrow_reader_clickbench/async/Q41                 1.00      5.0±0.07ms        ? ?/sec    1.02      5.1±0.09ms        ? ?/sec
arrow_reader_clickbench/async/Q42                 1.01      3.6±0.06ms        ? ?/sec    1.00      3.6±0.06ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q1     1.00   1067.1±4.82µs        ? ?/sec    1.00   1066.6±3.21µs        ? ?/sec
arrow_reader_clickbench/async_object_store/Q10    1.00      6.7±0.29ms        ? ?/sec    1.01      6.8±0.22ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q11    1.00      7.7±0.33ms        ? ?/sec    1.05      8.1±0.07ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q12    1.02     14.9±0.31ms        ? ?/sec    1.00     14.6±0.26ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q13    1.04     17.7±0.19ms        ? ?/sec    1.00     17.1±0.38ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q14    1.02     16.2±0.37ms        ? ?/sec    1.00     15.9±0.36ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q19    1.02      3.0±0.06ms        ? ?/sec    1.00      2.9±0.07ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q20    1.03     72.2±0.84ms        ? ?/sec    1.00     70.0±0.39ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q21    1.03     81.1±0.80ms        ? ?/sec    1.00     78.9±0.24ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q22    1.04     99.5±1.87ms        ? ?/sec    1.00     95.8±1.07ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q23    1.12    234.3±2.78ms        ? ?/sec    1.00    210.0±2.64ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q24    1.02     19.6±0.54ms        ? ?/sec    1.00     19.3±0.33ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q27    1.03     57.3±0.68ms        ? ?/sec    1.00     55.8±0.18ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q28    1.03     58.9±0.85ms        ? ?/sec    1.00     57.2±0.47ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q30    1.02     18.6±0.19ms        ? ?/sec    1.00     18.3±0.22ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q36    1.05     15.2±0.52ms        ? ?/sec    1.00     14.5±0.58ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q37    1.01      5.5±0.13ms        ? ?/sec    1.00      5.4±0.12ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q38    1.01     13.1±0.47ms        ? ?/sec    1.00     12.9±0.24ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q39    1.03     23.9±0.62ms        ? ?/sec    1.00     23.2±0.32ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q40    1.00      5.7±0.03ms        ? ?/sec    1.00      5.7±0.16ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q41    1.00      4.9±0.09ms        ? ?/sec    1.01      4.9±0.12ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q42    1.01      3.5±0.05ms        ? ?/sec    1.00      3.4±0.05ms        ? ?/sec
arrow_reader_clickbench/sync/Q1                   1.00   870.7±10.33µs        ? ?/sec    1.01    877.5±3.87µs        ? ?/sec
arrow_reader_clickbench/sync/Q10                  1.00      5.2±0.15ms        ? ?/sec    1.00      5.2±0.07ms        ? ?/sec
arrow_reader_clickbench/sync/Q11                  1.00      6.2±0.08ms        ? ?/sec    1.03      6.3±0.06ms        ? ?/sec
arrow_reader_clickbench/sync/Q12                  1.05     22.7±0.66ms        ? ?/sec    1.00     21.6±0.17ms        ? ?/sec
arrow_reader_clickbench/sync/Q13                  1.20     29.1±0.91ms        ? ?/sec    1.00     24.3±0.25ms        ? ?/sec
arrow_reader_clickbench/sync/Q14                  1.03     23.6±0.34ms        ? ?/sec    1.00     22.9±0.27ms        ? ?/sec
arrow_reader_clickbench/sync/Q19                  1.00      2.7±0.07ms        ? ?/sec    1.00      2.7±0.08ms        ? ?/sec
arrow_reader_clickbench/sync/Q20                  1.02    123.8±1.50ms        ? ?/sec    1.00    121.0±0.92ms        ? ?/sec
arrow_reader_clickbench/sync/Q21                  1.02     98.7±0.67ms        ? ?/sec    1.00     96.5±0.84ms        ? ?/sec
arrow_reader_clickbench/sync/Q22                  1.00    143.1±1.48ms        ? ?/sec    1.00    143.1±1.99ms        ? ?/sec
arrow_reader_clickbench/sync/Q23                  1.00   284.9±15.56ms        ? ?/sec    1.00   283.8±16.92ms        ? ?/sec
arrow_reader_clickbench/sync/Q24                  1.03     28.1±0.06ms        ? ?/sec    1.00     27.2±0.36ms        ? ?/sec
arrow_reader_clickbench/sync/Q27                  1.05    109.9±0.92ms        ? ?/sec    1.00    105.2±0.74ms        ? ?/sec
arrow_reader_clickbench/sync/Q28                  1.04    108.7±1.09ms        ? ?/sec    1.00    104.5±1.06ms        ? ?/sec
arrow_reader_clickbench/sync/Q30                  1.02     19.3±0.23ms        ? ?/sec    1.00     18.9±0.18ms        ? ?/sec
arrow_reader_clickbench/sync/Q36                  1.02     22.8±0.25ms        ? ?/sec    1.00     22.4±0.16ms        ? ?/sec
arrow_reader_clickbench/sync/Q37                  1.02      7.1±0.01ms        ? ?/sec    1.00      7.0±0.09ms        ? ?/sec
arrow_reader_clickbench/sync/Q38                  1.02     11.6±0.14ms        ? ?/sec    1.00     11.3±0.22ms        ? ?/sec
arrow_reader_clickbench/sync/Q39                  1.03     21.4±0.31ms        ? ?/sec    1.00     20.7±0.44ms        ? ?/sec
arrow_reader_clickbench/sync/Q40                  1.00      5.2±0.11ms        ? ?/sec    1.03      5.3±0.11ms        ? ?/sec
arrow_reader_clickbench/sync/Q41                  1.00      5.7±0.07ms        ? ?/sec    1.01      5.8±0.03ms        ? ?/sec
arrow_reader_clickbench/sync/Q42                  1.03      4.5±0.02ms        ? ?/sec    1.00      4.4±0.06ms        ? ?/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 789.4s
Peak memory 3.1 GiB
Avg memory 2.9 GiB
CPU user 703.9s
CPU sys 85.3s
Disk read 0 B
Disk write 2.0 GiB

branch

Metric Value
Wall time 797.1s
Peak memory 3.2 GiB
Avg memory 3.1 GiB
CPU user 708.0s
CPU sys 88.4s
Disk read 0 B
Disk write 172.6 MiB

Dandandan and others added 2 commits March 19, 2026 14:50
Reserve the full output capacity upfront before the decode loop,
eliminating per-chunk reallocation checks inside extend. This gives
a ~25% speedup for dictionary-encoded StringView reads.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add RleDecoder::get_batch_direct which exposes RLE vs bit-packed batches
via a callback, allowing callers to handle each case optimally. For RLE
runs, the dict view is looked up once and repeated directly with
repeat_n, skipping the index buffer entirely. For bit-packed runs,
indices are decoded to a stack-local buffer and gathered immediately.

This eliminates the intermediate index buffer roundtrip for the common
RLE case and reduces StringView dictionary decoding time by ~49%
(137µs → 70µs in benchmarks).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Dandandan
Copy link
Contributor Author

run benchmark arrow_reader arrow_reader_clickbench

@adriangbot
Copy link

🤖 Arrow criterion benchmark running (GKE) | trigger
Linux bench-c4090723808-462-hrpgs 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux
Comparing optimize-snappy-vlq-decoding (d9b3d30) to 88422cb (merge-base) diff
BENCH_NAME=arrow_reader
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_reader
BENCH_FILTER=
Results will be posted here when complete

@adriangbot
Copy link

🤖 Arrow criterion benchmark running (GKE) | trigger
Linux bench-c4090723808-463-97ntt 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux
Comparing optimize-snappy-vlq-decoding (d9b3d30) to 88422cb (merge-base) diff
BENCH_NAME=arrow_reader_clickbench
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_reader_clickbench
BENCH_FILTER=
Results will be posted here when complete

@adriangbot
Copy link

🤖 Arrow criterion benchmark completed (GKE) | trigger

Details

group                                             main                                   optimize-snappy-vlq-decoding
-----                                             ----                                   ----------------------------
arrow_reader_clickbench/async/Q1                  1.00   1085.7±7.85µs        ? ?/sec    1.00   1086.8±5.12µs        ? ?/sec
arrow_reader_clickbench/async/Q10                 1.13      6.8±0.12ms        ? ?/sec    1.00      6.0±0.11ms        ? ?/sec
arrow_reader_clickbench/async/Q11                 1.11      7.9±0.20ms        ? ?/sec    1.00      7.1±0.14ms        ? ?/sec
arrow_reader_clickbench/async/Q12                 1.05     14.7±0.18ms        ? ?/sec    1.00     14.0±0.19ms        ? ?/sec
arrow_reader_clickbench/async/Q13                 1.04     17.2±0.28ms        ? ?/sec    1.00     16.5±0.29ms        ? ?/sec
arrow_reader_clickbench/async/Q14                 1.02     16.2±0.25ms        ? ?/sec    1.00     15.8±0.21ms        ? ?/sec
arrow_reader_clickbench/async/Q19                 1.03      3.1±0.04ms        ? ?/sec    1.00      3.0±0.03ms        ? ?/sec
arrow_reader_clickbench/async/Q20                 1.00     77.6±4.01ms        ? ?/sec    1.06     81.9±0.53ms        ? ?/sec
arrow_reader_clickbench/async/Q21                 1.00     95.1±5.00ms        ? ?/sec    1.03    97.7±12.12ms        ? ?/sec
arrow_reader_clickbench/async/Q22                 1.00    131.7±4.42ms        ? ?/sec    1.01    132.5±0.96ms        ? ?/sec
arrow_reader_clickbench/async/Q23                 1.05    251.6±3.85ms        ? ?/sec    1.00    239.3±4.20ms        ? ?/sec
arrow_reader_clickbench/async/Q24                 1.04     20.0±0.20ms        ? ?/sec    1.00     19.3±0.21ms        ? ?/sec
arrow_reader_clickbench/async/Q27                 1.05     59.1±0.38ms        ? ?/sec    1.00     56.1±0.40ms        ? ?/sec
arrow_reader_clickbench/async/Q28                 1.06     59.2±0.61ms        ? ?/sec    1.00     56.1±0.39ms        ? ?/sec
arrow_reader_clickbench/async/Q30                 1.04     18.7±0.18ms        ? ?/sec    1.00     18.1±0.08ms        ? ?/sec
arrow_reader_clickbench/async/Q36                 1.04     15.4±0.36ms        ? ?/sec    1.00     14.8±0.40ms        ? ?/sec
arrow_reader_clickbench/async/Q37                 1.03      5.5±0.04ms        ? ?/sec    1.00      5.3±0.05ms        ? ?/sec
arrow_reader_clickbench/async/Q38                 1.05     13.9±0.40ms        ? ?/sec    1.00     13.2±0.20ms        ? ?/sec
arrow_reader_clickbench/async/Q39                 1.10     25.7±0.48ms        ? ?/sec    1.00     23.4±0.35ms        ? ?/sec
arrow_reader_clickbench/async/Q40                 1.03      5.8±0.10ms        ? ?/sec    1.00      5.6±0.06ms        ? ?/sec
arrow_reader_clickbench/async/Q41                 1.02      5.1±0.05ms        ? ?/sec    1.00      5.0±0.05ms        ? ?/sec
arrow_reader_clickbench/async/Q42                 1.01      3.5±0.03ms        ? ?/sec    1.00      3.5±0.02ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q1     1.00   1058.8±6.97µs        ? ?/sec    1.02   1076.7±5.03µs        ? ?/sec
arrow_reader_clickbench/async_object_store/Q10    1.13      6.8±0.13ms        ? ?/sec    1.00      6.0±0.09ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q11    1.09      7.7±0.21ms        ? ?/sec    1.00      7.1±0.20ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q12    1.04     14.4±0.23ms        ? ?/sec    1.00     13.9±0.23ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q13    1.04     17.3±0.27ms        ? ?/sec    1.00     16.6±0.28ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q14    1.05     16.2±0.29ms        ? ?/sec    1.00     15.3±0.24ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q19    1.03      3.0±0.04ms        ? ?/sec    1.00      2.9±0.03ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q20    1.03     72.3±0.67ms        ? ?/sec    1.00     69.9±0.23ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q21    1.04     81.5±0.66ms        ? ?/sec    1.00     78.1±0.37ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q22    1.03    100.0±0.64ms        ? ?/sec    1.00     96.7±2.93ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q23    1.08    226.3±5.03ms        ? ?/sec    1.00    208.9±4.08ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q24    1.03     19.8±0.25ms        ? ?/sec    1.00     19.1±0.21ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q27    1.03     57.0±0.73ms        ? ?/sec    1.00     55.3±0.45ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q28    1.04     57.8±0.70ms        ? ?/sec    1.00     55.3±0.52ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q30    1.04     18.5±0.11ms        ? ?/sec    1.00     17.8±0.13ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q36    1.02     14.8±0.51ms        ? ?/sec    1.00     14.4±0.35ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q37    1.04      5.5±0.06ms        ? ?/sec    1.00      5.2±0.05ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q38    1.03     13.2±0.29ms        ? ?/sec    1.00     12.8±0.12ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q39    1.04     23.6±0.65ms        ? ?/sec    1.00     22.7±0.48ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q40    1.03      5.6±0.13ms        ? ?/sec    1.00      5.5±0.10ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q41    1.00      4.8±0.07ms        ? ?/sec    1.00      4.8±0.05ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q42    1.00      3.4±0.02ms        ? ?/sec    1.00      3.4±0.02ms        ? ?/sec
arrow_reader_clickbench/sync/Q1                   1.00    868.0±1.91µs        ? ?/sec    1.00    870.8±2.98µs        ? ?/sec
arrow_reader_clickbench/sync/Q10                  1.11      5.2±0.03ms        ? ?/sec    1.00      4.7±0.03ms        ? ?/sec
arrow_reader_clickbench/sync/Q11                  1.10      6.2±0.06ms        ? ?/sec    1.00      5.6±0.04ms        ? ?/sec
arrow_reader_clickbench/sync/Q12                  1.05     22.2±0.72ms        ? ?/sec    1.00     21.2±0.22ms        ? ?/sec
arrow_reader_clickbench/sync/Q13                  1.00     29.0±0.96ms        ? ?/sec    1.01     29.2±0.87ms        ? ?/sec
arrow_reader_clickbench/sync/Q14                  1.04     23.2±0.20ms        ? ?/sec    1.00     22.4±0.22ms        ? ?/sec
arrow_reader_clickbench/sync/Q19                  1.04      2.8±0.05ms        ? ?/sec    1.00      2.7±0.02ms        ? ?/sec
arrow_reader_clickbench/sync/Q20                  1.03    123.7±0.64ms        ? ?/sec    1.00    120.6±0.92ms        ? ?/sec
arrow_reader_clickbench/sync/Q21                  1.10     99.7±0.88ms        ? ?/sec    1.00     90.7±0.19ms        ? ?/sec
arrow_reader_clickbench/sync/Q22                  1.07    146.0±0.99ms        ? ?/sec    1.00    135.8±0.72ms        ? ?/sec
arrow_reader_clickbench/sync/Q23                  1.03   286.0±14.82ms        ? ?/sec    1.00   277.1±15.23ms        ? ?/sec
arrow_reader_clickbench/sync/Q24                  1.00     27.6±0.28ms        ? ?/sec    1.05     29.1±0.27ms        ? ?/sec
arrow_reader_clickbench/sync/Q27                  1.05    110.1±0.72ms        ? ?/sec    1.00    104.6±0.24ms        ? ?/sec
arrow_reader_clickbench/sync/Q28                  1.06    108.5±0.51ms        ? ?/sec    1.00    102.6±0.31ms        ? ?/sec
arrow_reader_clickbench/sync/Q30                  1.04     19.0±0.17ms        ? ?/sec    1.00     18.3±0.20ms        ? ?/sec
arrow_reader_clickbench/sync/Q36                  1.00     22.3±0.35ms        ? ?/sec    1.00     22.3±0.30ms        ? ?/sec
arrow_reader_clickbench/sync/Q37                  1.00      6.9±0.02ms        ? ?/sec    1.11      7.7±0.04ms        ? ?/sec
arrow_reader_clickbench/sync/Q38                  1.02     11.5±0.15ms        ? ?/sec    1.00     11.3±0.16ms        ? ?/sec
arrow_reader_clickbench/sync/Q39                  1.01     21.2±0.27ms        ? ?/sec    1.00     21.0±0.21ms        ? ?/sec
arrow_reader_clickbench/sync/Q40                  1.03      5.3±0.09ms        ? ?/sec    1.00      5.2±0.07ms        ? ?/sec
arrow_reader_clickbench/sync/Q41                  1.01      5.6±0.06ms        ? ?/sec    1.00      5.6±0.05ms        ? ?/sec
arrow_reader_clickbench/sync/Q42                  1.01      4.4±0.04ms        ? ?/sec    1.00      4.4±0.02ms        ? ?/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 788.7s
Peak memory 3.1 GiB
Avg memory 2.9 GiB
CPU user 705.5s
CPU sys 82.9s
Disk read 0 B
Disk write 2.1 GiB

branch

Metric Value
Wall time 779.2s
Peak memory 3.2 GiB
Avg memory 3.1 GiB
CPU user 707.3s
CPU sys 71.7s
Disk read 0 B
Disk write 171.3 MiB

@Dandandan Dandandan force-pushed the optimize-snappy-vlq-decoding branch 4 times, most recently from a0815cd to 4b9a13b Compare March 19, 2026 16:20
Replace the if/else checked/unchecked branching in get_batch_with_dict
with a single branchless .min(max_idx) clamp. This:
- Prevents UB on corrupt parquet files (indices clamped to valid range)
- Removes the if/else branch, simplifying codegen
- Improves i32 dict perf by ~13% (60µs → 52µs) due to simpler code
- StringView dict remains at 75µs (45% faster than 137µs baseline)

Remove unused bit_width field from DictIndexDecoder.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Dandandan Dandandan force-pushed the optimize-snappy-vlq-decoding branch from 4b9a13b to 0fcda30 Compare March 19, 2026 16:26
Dandandan and others added 2 commits March 19, 2026 18:21
Reserve offsets capacity upfront before the decode loop to avoid
per-chunk reallocation. ~3.5% improvement for StringArray dict reads.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
These are only used by the arrow dictionary_index decoder. Without
the arrow feature, they appear as dead code to clippy.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Dandandan
Copy link
Contributor Author

run benchmark arrow_reader_clickbench

@Dandandan Dandandan changed the title [ClickBench] Avoid zeroing in Snappy and micro-optimize parquet reader Optimize dictionary decoding in parquet reader Mar 19, 2026
@Dandandan Dandandan changed the title Optimize dictionary decoding in parquet reader Decompress / decoding in parquet reader improvements Mar 19, 2026
@adriangbot
Copy link

🤖 Arrow criterion benchmark running (GKE) | trigger
Linux bench-c4091992583-464-g86f9 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux
Comparing optimize-snappy-vlq-decoding (dffd627) to 88422cb (merge-base) diff
BENCH_NAME=arrow_reader_clickbench
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_reader_clickbench
BENCH_FILTER=
Results will be posted here when complete

@adriangbot
Copy link

🤖 Arrow criterion benchmark completed (GKE) | trigger

Details

group                                             main                                   optimize-snappy-vlq-decoding
-----                                             ----                                   ----------------------------
arrow_reader_clickbench/async/Q1                  1.00   1088.7±6.44µs        ? ?/sec    1.00   1091.6±7.65µs        ? ?/sec
arrow_reader_clickbench/async/Q10                 1.12      6.9±0.06ms        ? ?/sec    1.00      6.1±0.06ms        ? ?/sec
arrow_reader_clickbench/async/Q11                 1.08      7.9±0.04ms        ? ?/sec    1.00      7.3±0.10ms        ? ?/sec
arrow_reader_clickbench/async/Q12                 1.02     14.7±0.05ms        ? ?/sec    1.00     14.4±0.06ms        ? ?/sec
arrow_reader_clickbench/async/Q13                 1.02     17.4±0.09ms        ? ?/sec    1.00     17.0±0.10ms        ? ?/sec
arrow_reader_clickbench/async/Q14                 1.02     16.2±0.07ms        ? ?/sec    1.00     15.8±0.07ms        ? ?/sec
arrow_reader_clickbench/async/Q19                 1.03      3.1±0.03ms        ? ?/sec    1.00      3.0±0.03ms        ? ?/sec
arrow_reader_clickbench/async/Q20                 1.00     77.5±4.29ms        ? ?/sec    1.25     97.2±0.97ms        ? ?/sec
arrow_reader_clickbench/async/Q21                 1.00     86.8±0.40ms        ? ?/sec    1.27    110.2±7.66ms        ? ?/sec
arrow_reader_clickbench/async/Q22                 1.09    130.0±5.38ms        ? ?/sec    1.00    119.2±2.09ms        ? ?/sec
arrow_reader_clickbench/async/Q23                 1.04    247.9±2.02ms        ? ?/sec    1.00    237.7±0.72ms        ? ?/sec
arrow_reader_clickbench/async/Q24                 1.03     19.8±0.11ms        ? ?/sec    1.00     19.2±0.08ms        ? ?/sec
arrow_reader_clickbench/async/Q27                 1.04     58.5±0.27ms        ? ?/sec    1.00     56.1±0.18ms        ? ?/sec
arrow_reader_clickbench/async/Q28                 1.03     58.7±0.36ms        ? ?/sec    1.00     56.8±0.19ms        ? ?/sec
arrow_reader_clickbench/async/Q30                 1.05     19.0±0.07ms        ? ?/sec    1.00     18.1±0.07ms        ? ?/sec
arrow_reader_clickbench/async/Q36                 1.05     15.7±0.23ms        ? ?/sec    1.00     15.0±0.13ms        ? ?/sec
arrow_reader_clickbench/async/Q37                 1.02      5.5±0.05ms        ? ?/sec    1.00      5.4±0.02ms        ? ?/sec
arrow_reader_clickbench/async/Q38                 1.06     13.9±0.25ms        ? ?/sec    1.00     13.2±0.10ms        ? ?/sec
arrow_reader_clickbench/async/Q39                 1.06     25.3±0.45ms        ? ?/sec    1.00     23.9±0.20ms        ? ?/sec
arrow_reader_clickbench/async/Q40                 1.05      5.9±0.07ms        ? ?/sec    1.00      5.6±0.02ms        ? ?/sec
arrow_reader_clickbench/async/Q41                 1.03      5.1±0.05ms        ? ?/sec    1.00      4.9±0.03ms        ? ?/sec
arrow_reader_clickbench/async/Q42                 1.02      3.6±0.02ms        ? ?/sec    1.00      3.5±0.02ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q1     1.00   1071.5±7.66µs        ? ?/sec    1.01   1079.5±6.24µs        ? ?/sec
arrow_reader_clickbench/async_object_store/Q10    1.16      6.9±0.06ms        ? ?/sec    1.00      5.9±0.03ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q11    1.12      7.8±0.04ms        ? ?/sec    1.00      6.9±0.03ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q12    1.04     14.7±0.08ms        ? ?/sec    1.00     14.1±0.05ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q13    1.04     17.3±0.09ms        ? ?/sec    1.00     16.7±0.07ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q14    1.02     16.2±0.08ms        ? ?/sec    1.00     15.8±0.05ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q19    1.02      3.0±0.03ms        ? ?/sec    1.00      3.0±0.04ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q20    1.03     72.1±0.44ms        ? ?/sec    1.00     70.1±0.33ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q21    1.03     80.4±0.49ms        ? ?/sec    1.00     78.1±0.30ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q22    1.04     98.0±0.64ms        ? ?/sec    1.00     94.7±0.20ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q23    1.00    218.9±5.12ms        ? ?/sec    1.02    224.3±0.45ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q24    1.05     19.6±0.22ms        ? ?/sec    1.00     18.7±0.07ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q27    1.04     57.1±0.46ms        ? ?/sec    1.00     55.1±0.18ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q28    1.04     57.8±0.34ms        ? ?/sec    1.00     55.4±0.16ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q30    1.04     18.4±0.10ms        ? ?/sec    1.00     17.7±0.05ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q36    1.05     15.1±0.25ms        ? ?/sec    1.00     14.3±0.10ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q37    1.00      5.4±0.02ms        ? ?/sec    1.00      5.4±0.02ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q38    1.02     13.2±0.20ms        ? ?/sec    1.00     12.9±0.11ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q39    1.04     23.9±0.40ms        ? ?/sec    1.00     22.9±0.14ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q40    1.04      5.6±0.04ms        ? ?/sec    1.00      5.4±0.03ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q41    1.02      4.9±0.03ms        ? ?/sec    1.00      4.8±0.02ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q42    1.03      3.5±0.03ms        ? ?/sec    1.00      3.4±0.02ms        ? ?/sec
arrow_reader_clickbench/sync/Q1                   1.00    868.1±3.71µs        ? ?/sec    1.01    881.1±3.25µs        ? ?/sec
arrow_reader_clickbench/sync/Q10                  1.10      5.1±0.03ms        ? ?/sec    1.00      4.7±0.01ms        ? ?/sec
arrow_reader_clickbench/sync/Q11                  1.09      6.1±0.04ms        ? ?/sec    1.00      5.6±0.04ms        ? ?/sec
arrow_reader_clickbench/sync/Q12                  1.05     22.3±0.65ms        ? ?/sec    1.00     21.3±0.12ms        ? ?/sec
arrow_reader_clickbench/sync/Q13                  1.00     28.9±0.83ms        ? ?/sec    1.02     29.4±0.81ms        ? ?/sec
arrow_reader_clickbench/sync/Q14                  1.04     23.4±0.09ms        ? ?/sec    1.00     22.4±0.04ms        ? ?/sec
arrow_reader_clickbench/sync/Q19                  1.05      2.8±0.03ms        ? ?/sec    1.00      2.6±0.02ms        ? ?/sec
arrow_reader_clickbench/sync/Q20                  1.01    122.5±0.27ms        ? ?/sec    1.00    121.2±0.17ms        ? ?/sec
arrow_reader_clickbench/sync/Q21                  1.09     98.6±0.30ms        ? ?/sec    1.00     90.6±0.12ms        ? ?/sec
arrow_reader_clickbench/sync/Q22                  1.07    145.7±0.45ms        ? ?/sec    1.00    136.1±0.25ms        ? ?/sec
arrow_reader_clickbench/sync/Q23                  1.02   285.3±14.88ms        ? ?/sec    1.00   279.1±17.02ms        ? ?/sec
arrow_reader_clickbench/sync/Q24                  1.05     27.7±0.13ms        ? ?/sec    1.00     26.4±0.07ms        ? ?/sec
arrow_reader_clickbench/sync/Q27                  1.04    108.7±0.28ms        ? ?/sec    1.00    105.1±0.18ms        ? ?/sec
arrow_reader_clickbench/sync/Q28                  1.04    106.9±0.16ms        ? ?/sec    1.00    102.5±0.15ms        ? ?/sec
arrow_reader_clickbench/sync/Q30                  1.04     19.0±0.04ms        ? ?/sec    1.00     18.3±0.09ms        ? ?/sec
arrow_reader_clickbench/sync/Q36                  1.01     22.6±0.05ms        ? ?/sec    1.00     22.4±0.04ms        ? ?/sec
arrow_reader_clickbench/sync/Q37                  1.00      6.9±0.01ms        ? ?/sec    1.11      7.7±0.02ms        ? ?/sec
arrow_reader_clickbench/sync/Q38                  1.02     11.6±0.02ms        ? ?/sec    1.00     11.4±0.02ms        ? ?/sec
arrow_reader_clickbench/sync/Q39                  1.02     21.2±0.05ms        ? ?/sec    1.00     20.8±0.04ms        ? ?/sec
arrow_reader_clickbench/sync/Q40                  1.04      5.3±0.04ms        ? ?/sec    1.00      5.1±0.03ms        ? ?/sec
arrow_reader_clickbench/sync/Q41                  1.02      5.7±0.02ms        ? ?/sec    1.00      5.7±0.03ms        ? ?/sec
arrow_reader_clickbench/sync/Q42                  1.01      4.4±0.03ms        ? ?/sec    1.00      4.4±0.02ms        ? ?/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 786.5s
Peak memory 3.1 GiB
Avg memory 2.9 GiB
CPU user 706.7s
CPU sys 79.6s
Disk read 0 B
Disk write 844.9 MiB

branch

Metric Value
Wall time 780.7s
Peak memory 3.2 GiB
Avg memory 3.1 GiB
CPU user 705.4s
CPU sys 75.4s
Disk read 0 B
Disk write 171.4 MiB

@Dandandan
Copy link
Contributor Author

NIce - up to 10% with the combined changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants