Skip to content

Use chunks_exact for has_true/has_false to enable compiler unrolling#9570

Merged
Dandandan merged 1 commit intoapache:mainfrom
pydantic:improve-has-true-false
Mar 18, 2026
Merged

Use chunks_exact for has_true/has_false to enable compiler unrolling#9570
Dandandan merged 1 commit intoapache:mainfrom
pydantic:improve-has-true-false

Conversation

@adriangb
Copy link
Contributor

Summary

  • Replace .chunks(64) with .chunks_exact(16) in has_true() and has_false() as suggested in Add has_true() and has_false() to BooleanArray #9511 (comment)
  • With chunks_exact, the compiler can fully unroll the inner fold (guaranteed size, no inner branch/loop), allowing a smaller block size for more frequent short-circuit exits without regressing the full-scan path

Benchmark results (block size 16 vs baseline)

  • Full-scan worst case (65536): No regression (~49ns both)
  • Early-exit cases (65536): ~27% faster (6.0ns → 4.4ns)
  • Small arrays (64, 1024): Unchanged

Test plan

  • All 13 existing test_has tests pass

run benchmarks boolean_array

@Dandandan Would appreciate your review!

🤖 Generated with Claude Code

Replace `.chunks(64)` with `.chunks_exact(16)` in `has_true()` and
`has_false()`. With `chunks_exact`, the compiler can fully unroll the
inner fold (guaranteed size, no inner branch/loop), allowing a smaller
block size for more frequent short-circuit exits without regressing the
full-scan path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added the arrow Changes to the arrow crate label Mar 18, 2026
@adriangb
Copy link
Contributor Author

run benchmark boolean_array

@adriangb
Copy link
Contributor Author

cc @Dandandan

@adriangbot
Copy link

🤖 Arrow criterion benchmark running (GKE) | trigger
Linux bench-c4080068357-397-rwfw5 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux
Comparing improve-has-true-false (e96e0e6) to d426107 (merge-base) diff
BENCH_NAME=boolean_array
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench boolean_array
BENCH_FILTER=
Results will be posted here when complete

@adriangbot
Copy link

🤖 Arrow criterion benchmark completed (GKE) | trigger

Details

group                                improve-has-true-false                 main
-----                                ----------------------                 ----
has_false(all_false, 1024)           1.00      7.8±0.03ns        ? ?/sec    1.08      8.5±0.09ns        ? ?/sec
has_false(all_false, 64)             1.00      8.2±0.08ns        ? ?/sec    1.00      8.1±0.09ns        ? ?/sec
has_false(all_false, 65536)          1.00      7.8±0.03ns        ? ?/sec    1.72     13.5±0.02ns        ? ?/sec
has_false(all_true, 1024)            1.00      7.8±0.05ns        ? ?/sec    1.11      8.7±0.16ns        ? ?/sec
has_false(all_true, 64)              1.00      8.3±0.14ns        ? ?/sec    1.02      8.5±0.07ns        ? ?/sec
has_false(all_true, 65536)           1.00     87.8±0.15ns        ? ?/sec    1.40    123.3±0.88ns        ? ?/sec
has_false(mixed_early, 1024)         1.00      8.2±0.04ns        ? ?/sec    1.07      8.7±0.12ns        ? ?/sec
has_false(mixed_early, 64)           1.00      8.3±0.08ns        ? ?/sec    1.03      8.5±0.12ns        ? ?/sec
has_false(mixed_early, 65536)        1.00      8.2±0.03ns        ? ?/sec    1.67     13.7±0.03ns        ? ?/sec
has_false(nulls_all_true, 1024)      1.02     19.2±0.06ns        ? ?/sec    1.00     18.7±0.14ns        ? ?/sec
has_false(nulls_all_true, 64)        1.15      7.7±0.04ns        ? ?/sec    1.00      6.7±0.06ns        ? ?/sec
has_false(nulls_all_true, 65536)     1.00    939.2±4.09ns        ? ?/sec    1.00    942.3±2.69ns        ? ?/sec
has_true(all_false, 1024)            1.00      7.7±0.05ns        ? ?/sec    1.09      8.4±0.03ns        ? ?/sec
has_true(all_false, 64)              1.01      8.2±0.09ns        ? ?/sec    1.00      8.2±0.08ns        ? ?/sec
has_true(all_false, 65536)           1.00     88.2±0.08ns        ? ?/sec    1.37    121.1±0.35ns        ? ?/sec
has_true(all_true, 1024)             1.00      7.5±0.03ns        ? ?/sec    1.16      8.6±0.02ns        ? ?/sec
has_true(all_true, 64)               1.01      7.9±0.07ns        ? ?/sec    1.00      7.8±0.07ns        ? ?/sec
has_true(all_true, 65536)            1.00      7.5±0.03ns        ? ?/sec    1.84     13.8±0.03ns        ? ?/sec
has_true(nulls_all_true, 1024)       1.03      6.5±0.11ns        ? ?/sec    1.00      6.3±0.10ns        ? ?/sec
has_true(nulls_all_true, 64)         1.04      6.5±0.11ns        ? ?/sec    1.00      6.3±0.10ns        ? ?/sec
has_true(nulls_all_true, 65536)      1.03      6.5±0.10ns        ? ?/sec    1.00      6.3±0.10ns        ? ?/sec
true_count(all_false, 1024)          1.01     10.0±0.04ns        ? ?/sec    1.00      9.9±0.02ns        ? ?/sec
true_count(all_false, 64)            1.00      8.7±0.34ns        ? ?/sec    1.01      8.8±0.30ns        ? ?/sec
true_count(all_false, 65536)         1.00    228.3±0.09ns        ? ?/sec    1.00    228.7±0.57ns        ? ?/sec
true_count(all_true, 1024)           1.00     10.0±0.03ns        ? ?/sec    1.02     10.2±0.02ns        ? ?/sec
true_count(all_true, 64)             1.00      8.7±0.31ns        ? ?/sec    1.05      9.2±0.29ns        ? ?/sec
true_count(all_true, 65536)          1.00    228.3±0.07ns        ? ?/sec    1.00    228.6±0.12ns        ? ?/sec
true_count(mixed_early, 1024)        1.01     10.3±0.03ns        ? ?/sec    1.00     10.2±0.02ns        ? ?/sec
true_count(mixed_early, 64)          1.00      9.1±0.32ns        ? ?/sec    1.02      9.2±0.30ns        ? ?/sec
true_count(mixed_early, 65536)       1.00    229.0±0.10ns        ? ?/sec    1.00    228.2±0.08ns        ? ?/sec
true_count(nulls_all_true, 1024)     1.03     18.9±0.03ns        ? ?/sec    1.00     18.4±0.03ns        ? ?/sec
true_count(nulls_all_true, 64)       1.02      7.3±0.06ns        ? ?/sec    1.00      7.2±0.19ns        ? ?/sec
true_count(nulls_all_true, 65536)    1.00    719.2±1.60ns        ? ?/sec    1.00    717.5±1.76ns        ? ?/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 315.5s
Peak memory 1.9 GiB
Avg memory 1.9 GiB
CPU user 314.6s
CPU sys 0.7s
Disk read 0 B
Disk write 1.1 GiB

branch

Metric Value
Wall time 318.2s
Peak memory 1.9 GiB
Avg memory 1.9 GiB
CPU user 318.0s
CPU sys 0.2s
Disk read 0 B
Disk write 2.8 MiB

@Dandandan Dandandan merged commit 19889a3 into apache:main Mar 18, 2026
25 of 26 checks passed
@Dandandan
Copy link
Contributor

Wooohoo! 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants