Use chunks_exact for has_true/has_false to enable compiler unrolling by adriangb · Pull Request #9570 · apache/arrow-rs

adriangb · 2026-03-18T06:26:58Z

Summary

Replace .chunks(64) with .chunks_exact(16) in has_true() and has_false() as suggested in Add has_true() and has_false() to BooleanArray #9511 (comment)
With chunks_exact, the compiler can fully unroll the inner fold (guaranteed size, no inner branch/loop), allowing a smaller block size for more frequent short-circuit exits without regressing the full-scan path

Benchmark results (block size 16 vs baseline)

Full-scan worst case (65536): No regression (~49ns both)
Early-exit cases (65536): ~27% faster (6.0ns → 4.4ns)
Small arrays (64, 1024): Unchanged

Test plan

All 13 existing test_has tests pass

run benchmarks boolean_array

@Dandandan Would appreciate your review!

🤖 Generated with Claude Code

Replace `.chunks(64)` with `.chunks_exact(16)` in `has_true()` and `has_false()`. With `chunks_exact`, the compiler can fully unroll the inner fold (guaranteed size, no inner branch/loop), allowing a smaller block size for more frequent short-circuit exits without regressing the full-scan path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

adriangb · 2026-03-18T06:27:37Z

run benchmark boolean_array

adriangb · 2026-03-18T06:27:45Z

cc @Dandandan

adriangbot · 2026-03-18T06:30:36Z

🤖 Arrow criterion benchmark running (GKE) | trigger
Linux bench-c4080068357-397-rwfw5 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux
Comparing improve-has-true-false (e96e0e6) to d426107 (merge-base) diff
BENCH_NAME=boolean_array
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench boolean_array
BENCH_FILTER=
Results will be posted here when complete

adriangbot · 2026-03-18T06:41:22Z

🤖 Arrow criterion benchmark completed (GKE) | trigger

Details

group                                improve-has-true-false                 main
-----                                ----------------------                 ----
has_false(all_false, 1024)           1.00      7.8±0.03ns        ? ?/sec    1.08      8.5±0.09ns        ? ?/sec
has_false(all_false, 64)             1.00      8.2±0.08ns        ? ?/sec    1.00      8.1±0.09ns        ? ?/sec
has_false(all_false, 65536)          1.00      7.8±0.03ns        ? ?/sec    1.72     13.5±0.02ns        ? ?/sec
has_false(all_true, 1024)            1.00      7.8±0.05ns        ? ?/sec    1.11      8.7±0.16ns        ? ?/sec
has_false(all_true, 64)              1.00      8.3±0.14ns        ? ?/sec    1.02      8.5±0.07ns        ? ?/sec
has_false(all_true, 65536)           1.00     87.8±0.15ns        ? ?/sec    1.40    123.3±0.88ns        ? ?/sec
has_false(mixed_early, 1024)         1.00      8.2±0.04ns        ? ?/sec    1.07      8.7±0.12ns        ? ?/sec
has_false(mixed_early, 64)           1.00      8.3±0.08ns        ? ?/sec    1.03      8.5±0.12ns        ? ?/sec
has_false(mixed_early, 65536)        1.00      8.2±0.03ns        ? ?/sec    1.67     13.7±0.03ns        ? ?/sec
has_false(nulls_all_true, 1024)      1.02     19.2±0.06ns        ? ?/sec    1.00     18.7±0.14ns        ? ?/sec
has_false(nulls_all_true, 64)        1.15      7.7±0.04ns        ? ?/sec    1.00      6.7±0.06ns        ? ?/sec
has_false(nulls_all_true, 65536)     1.00    939.2±4.09ns        ? ?/sec    1.00    942.3±2.69ns        ? ?/sec
has_true(all_false, 1024)            1.00      7.7±0.05ns        ? ?/sec    1.09      8.4±0.03ns        ? ?/sec
has_true(all_false, 64)              1.01      8.2±0.09ns        ? ?/sec    1.00      8.2±0.08ns        ? ?/sec
has_true(all_false, 65536)           1.00     88.2±0.08ns        ? ?/sec    1.37    121.1±0.35ns        ? ?/sec
has_true(all_true, 1024)             1.00      7.5±0.03ns        ? ?/sec    1.16      8.6±0.02ns        ? ?/sec
has_true(all_true, 64)               1.01      7.9±0.07ns        ? ?/sec    1.00      7.8±0.07ns        ? ?/sec
has_true(all_true, 65536)            1.00      7.5±0.03ns        ? ?/sec    1.84     13.8±0.03ns        ? ?/sec
has_true(nulls_all_true, 1024)       1.03      6.5±0.11ns        ? ?/sec    1.00      6.3±0.10ns        ? ?/sec
has_true(nulls_all_true, 64)         1.04      6.5±0.11ns        ? ?/sec    1.00      6.3±0.10ns        ? ?/sec
has_true(nulls_all_true, 65536)      1.03      6.5±0.10ns        ? ?/sec    1.00      6.3±0.10ns        ? ?/sec
true_count(all_false, 1024)          1.01     10.0±0.04ns        ? ?/sec    1.00      9.9±0.02ns        ? ?/sec
true_count(all_false, 64)            1.00      8.7±0.34ns        ? ?/sec    1.01      8.8±0.30ns        ? ?/sec
true_count(all_false, 65536)         1.00    228.3±0.09ns        ? ?/sec    1.00    228.7±0.57ns        ? ?/sec
true_count(all_true, 1024)           1.00     10.0±0.03ns        ? ?/sec    1.02     10.2±0.02ns        ? ?/sec
true_count(all_true, 64)             1.00      8.7±0.31ns        ? ?/sec    1.05      9.2±0.29ns        ? ?/sec
true_count(all_true, 65536)          1.00    228.3±0.07ns        ? ?/sec    1.00    228.6±0.12ns        ? ?/sec
true_count(mixed_early, 1024)        1.01     10.3±0.03ns        ? ?/sec    1.00     10.2±0.02ns        ? ?/sec
true_count(mixed_early, 64)          1.00      9.1±0.32ns        ? ?/sec    1.02      9.2±0.30ns        ? ?/sec
true_count(mixed_early, 65536)       1.00    229.0±0.10ns        ? ?/sec    1.00    228.2±0.08ns        ? ?/sec
true_count(nulls_all_true, 1024)     1.03     18.9±0.03ns        ? ?/sec    1.00     18.4±0.03ns        ? ?/sec
true_count(nulls_all_true, 64)       1.02      7.3±0.06ns        ? ?/sec    1.00      7.2±0.19ns        ? ?/sec
true_count(nulls_all_true, 65536)    1.00    719.2±1.60ns        ? ?/sec    1.00    717.5±1.76ns        ? ?/sec

Resource Usage

base (merge-base)

Metric	Value
Wall time	315.5s
Peak memory	1.9 GiB
Avg memory	1.9 GiB
CPU user	314.6s
CPU sys	0.7s
Disk read	0 B
Disk write	1.1 GiB

branch

Metric	Value
Wall time	318.2s
Peak memory	1.9 GiB
Avg memory	1.9 GiB
CPU user	318.0s
CPU sys	0.2s
Disk read	0 B
Disk write	2.8 MiB

Dandandan · 2026-03-18T07:58:09Z

Wooohoo! 🚀

github-actions bot added the arrow Changes to the arrow crate label Mar 18, 2026

adriangb mentioned this pull request Mar 18, 2026

Add has_true() and has_false() to BooleanArray #9511

Merged

Dandandan approved these changes Mar 18, 2026

View reviewed changes

Dandandan merged commit 19889a3 into apache:main Mar 18, 2026
25 of 26 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use chunks_exact for has_true/has_false to enable compiler unrolling#9570

Use chunks_exact for has_true/has_false to enable compiler unrolling#9570
Dandandan merged 1 commit intoapache:mainfrom
pydantic:improve-has-true-false

adriangb commented Mar 18, 2026

Uh oh!

adriangb commented Mar 18, 2026

Uh oh!

adriangb commented Mar 18, 2026

Uh oh!

adriangbot commented Mar 18, 2026

Uh oh!

adriangbot commented Mar 18, 2026

Uh oh!

Uh oh!

Dandandan commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

adriangb commented Mar 18, 2026

Summary

Benchmark results (block size 16 vs baseline)

Test plan

Uh oh!

adriangb commented Mar 18, 2026

Uh oh!

adriangb commented Mar 18, 2026

Uh oh!

adriangbot commented Mar 18, 2026

Uh oh!

adriangbot commented Mar 18, 2026

Uh oh!

Uh oh!

Dandandan commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants