Add has_true() and has_false() to BooleanArray by adriangb · Pull Request #9511 · apache/arrow-rs

adriangb · 2026-03-05T10:26:39Z

Motivation

When working with BooleanArray, a common pattern is checking whether any true or false value exists — e.g.
arr.true_count() > 0 or arr.false_count() == 0. This currently requires true_count() / false_count(), which scan the entire bitmap to count every set bit (via popcount), even though we only need to know if at least one exists.

This PR adds has_true() and has_false() methods that short-circuit as soon as they find a matching value, providing both:

Better performance — faster on large arrays in the best case
More ergonomic API — arr.has_true() expresses intent more clearly than arr.true_count() > 0

Callsites in DataFusion

There are several places in DataFusion that would benefit from these methods:

datafusion/functions-nested/src/array_has.rs — eq_array.true_count() > 0 → eq_array.has_true()
datafusion/physical-plan/src/topk/mod.rs — filter.true_count() == 0 check → !filter.has_true()
datafusion/datasource-parquet/src/metadata.rs — exactness.true_count() == 0 and combined_mask.true_count() > 0
datafusion/physical-plan/src/joins/nested_loop_join.rs — bitmap.true_count() == 0 checks
datafusion/physical-expr-common/src/physical_expr.rs — selection_count == 0 from selection.true_count()
datafusion/physical-expr/src/expressions/binary.rs — short-circuit checks for AND/OR

Benchmark Results

Scenario                          true_count     has_true       has_false      Speedup (best)
─────────────────────────────────────────────────────────────────────────────────────────────
all_true, 64                      4.32 ns        4.08 ns        4.76 ns        ~1.1x
all_false, 64                     4.30 ns        4.25 ns        4.52 ns        ~1.0x
all_true, 1024                    5.15 ns        4.52 ns        4.99 ns        ~1.1x
all_false, 1024                   5.17 ns        4.55 ns        5.00 ns        ~1.1x
mixed_early, 1024                 5.22 ns        —              5.04 ns        ~1.0x
nulls_all_true, 1024              12.84 ns       4.10 ns        12.92 ns       ~3.1x
all_true, 65536                   100.06 ns      5.96 ns        49.70 ns       ~16.8x (has_true)
all_false, 65536                  99.33 ns       49.30 ns       6.19 ns        ~16.0x (has_false)
mixed_early, 65536                100.10 ns      —              6.20 ns        ~16.1x (has_false)
nulls_all_true, 65536             522.94 ns      4.05 ns        521.82 ns      ~129x (has_true)

The key wins are on larger arrays (65,536 elements), where has_true/has_false are up to 16-129x faster than
true_count() in best-case scenarios (early short-circuit). Even in worst case (must scan entire array), performance is
comparable to true_count.

Implementation

The implementation processes bits in 64-bit chunks using UnalignedBitChunk, which handles arbitrary bit offsets and aligns
data for SIMD-friendly processing.

has_true (no nulls): OR-folds 64-bit chunks, short-circuits when any bit is set
has_false (no nulls): AND-folds 64-bit chunks, short-circuits when any bit is unset (with padding bits masked to 1)
With nulls: Iterates paired (null, value) chunks, checking null & value != 0 (has_true) or null & !value != 0
(has_false)

Alternatives considered

Fully vectorized (no early stopping): Would process the entire bitmap like true_count() but with simpler bitwise ops
instead of popcount. Marginally faster than true_count() but misses the main optimization opportunity.
Per-element iteration with early stopping: self.iter().any(|v| v == Some(true)). Simple but processes one bit at a
time, missing SIMD vectorization of the inner loop. Our approach processes 64 bits at a time while still supporting early
exit.

The chosen approach balances SIMD-friendly bulk processing (64 bits per iteration) with early termination, giving the best of
both worlds.

Test Plan

Unit tests covering: all-true, all-false, mixed, empty, nullable (all-valid-true, all-valid-false, all-null), non-aligned
lengths (65 elements, 64+1 with trailing false)
Criterion benchmarks comparing has_true/has_false vs true_count across sizes (64, 1024, 65536) and data distributions

🤖 Generated with [Claude Code](https://claude.com/claude-code

adriangb · 2026-03-05T10:32:30Z

run benchmark boolean_array

alamb-ghbot · 2026-03-05T10:32:39Z

🤖 Hi @adriangb, thanks for the request (#9511 (comment)).

scrape_comments.py only supports whitelisted benchmarks.

Standard: (none)
Criterion: array_from, array_iter, arrow_reader, arrow_reader_clickbench, arrow_reader_row_filter, arrow_statistics, arrow_writer, bitwise_kernel, boolean_kernels, buffer_bit_ops, builder, cast_kernels, coalesce_kernels, comparison_kernels, concatenate_kernel, csv_writer, encoding, filter_kernels, interleave_kernels, ipc_reader, json-reader, metadata, row_format, sort_kernel, take_kernels, union_array, variant_builder, variant_kernels, variant_validation, view_types, zip_kernels

Please choose one or more of these with run benchmark <name> or run benchmark <name1> <name2>...

You can also set environment variables on subsequent lines:

run benchmark tpch_mem
DATAFUSION_RUNTIME_MEMORY_LIMIT=1G

Unsupported benchmarks: boolean_array.

adriangb · 2026-03-05T11:07:59Z

cc @Dandandan

adriangb · 2026-03-12T22:06:23Z

run benchmark boolean_array

alamb-ghbot · 2026-03-12T22:06:31Z

🤖 Hi @adriangb, thanks for the request (#9511 (comment)).

scrape_comments.py only supports whitelisted benchmarks.

Standard: (none)
Criterion: array_from, array_iter, arrow_reader, arrow_reader_clickbench, arrow_reader_row_filter, arrow_statistics, arrow_writer, bitwise_kernel, boolean_kernels, buffer_bit_ops, builder, cast_kernels, coalesce_kernels, comparison_kernels, concatenate_kernel, csv_writer, encoding, filter_kernels, interleave_kernels, ipc_reader, json-reader, json_reader, metadata, row_format, sort_kernel, take_kernels, union_array, variant_builder, variant_kernels, variant_validation, view_types, zip_kernels

Please choose one or more of these with run benchmark <name> or run benchmark <name1> <name2>...

You can also set environment variables on subsequent lines:

run benchmark tpch_mem
DATAFUSION_RUNTIME_MEMORY_LIMIT=1G

Unsupported benchmarks: boolean_array.

adriangb · 2026-03-12T22:09:59Z

run benchmark boolean_array

alamb-ghbot · 2026-03-12T22:10:07Z

🤖 Hi @adriangb, thanks for the request (#9511 (comment)).

scrape_comments.py only supports whitelisted benchmarks.

Standard: (none)
Criterion: array_from, array_iter, arrow_reader, arrow_reader_clickbench, arrow_reader_row_filter, arrow_statistics, arrow_writer, bitwise_kernel, boolean_kernels, buffer_bit_ops, builder, cast_kernels, coalesce_kernels, comparison_kernels, concatenate_kernel, csv_writer, encoding, filter_kernels, interleave_kernels, ipc_reader, json-reader, json_reader, metadata, row_format, sort_kernel, take_kernels, union_array, variant_builder, variant_kernels, variant_validation, view_types, zip_kernels

Please choose one or more of these with run benchmark <name> or run benchmark <name1> <name2>...

You can also set environment variables on subsequent lines:

run benchmark tpch_mem
DATAFUSION_RUNTIME_MEMORY_LIMIT=1G

Unsupported benchmarks: boolean_array.

adriangbot · 2026-03-12T22:18:17Z

Benchmark job started for this request (job bench-c4050518673-174). Results will be posted here when complete.

adriangbot · 2026-03-12T22:18:18Z

Benchmark job started for this request (job bench-c4050498266-175). Results will be posted here when complete.

adriangbot · 2026-03-12T22:20:27Z

🤖 Arrow criterion benchmark running (GKE) | trigger
Linux bench-c4050498266-175-6hrsh 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux
Comparing add-has-true-has-false (1551756) to 5ba4515 (merge-base) diff
BENCH_NAME=boolean_array
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench boolean_array
BENCH_FILTER=
Results will be posted here when complete

adriangbot · 2026-03-12T22:20:47Z

Benchmark for this request failed.

Last 20 lines of output:

Click to expand

    project_record
    record_batch
    regexp_kernels
    row_format
    row_group_index_reader
    row_selection_cursor
    row_selector
    serde
    sort_kernel
    string_dictionary_builder
    string_run_builder
    string_run_iterator
    substring_kernels
    take_kernels
    union_array
    variant_builder
    variant_kernels
    variant_validation
    view_types
    zip_kernels

adriangbot · 2026-03-12T22:21:08Z

🤖 Arrow criterion benchmark running (GKE) | trigger
Linux bench-c4050518673-174-fxsf5 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux
Comparing add-has-true-has-false (1551756) to 5ba4515 (merge-base) diff
BENCH_NAME=boolean_array
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench boolean_array
BENCH_FILTER=
Results will be posted here when complete

adriangbot · 2026-03-12T22:21:15Z

Benchmark for this request failed.

Last 20 lines of output:

Click to expand

    project_record
    record_batch
    regexp_kernels
    row_format
    row_group_index_reader
    row_selection_cursor
    row_selector
    serde
    sort_kernel
    string_dictionary_builder
    string_run_builder
    string_run_iterator
    substring_kernels
    take_kernels
    union_array
    variant_builder
    variant_kernels
    variant_validation
    view_types
    zip_kernels

adriangbot · 2026-03-12T22:21:22Z

🤖 Arrow criterion benchmark running (GKE) | trigger
Linux bench-c4050498266-175-lwtcv 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux
Comparing add-has-true-has-false (1551756) to 5ba4515 (merge-base) diff
BENCH_NAME=boolean_array
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench boolean_array
BENCH_FILTER=
Results will be posted here when complete

adriangbot · 2026-03-12T22:21:28Z

Benchmark for this request failed.

Last 20 lines of output:

Click to expand

    project_record
    record_batch
    regexp_kernels
    row_format
    row_group_index_reader
    row_selection_cursor
    row_selector
    serde
    sort_kernel
    string_dictionary_builder
    string_run_builder
    string_run_iterator
    substring_kernels
    take_kernels
    union_array
    variant_builder
    variant_kernels
    variant_validation
    view_types
    zip_kernels

adriangbot · 2026-03-12T22:21:49Z

🤖 Arrow criterion benchmark running (GKE) | trigger
Linux bench-c4050518673-174-wkwzs 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux
Comparing add-has-true-has-false (1551756) to 5ba4515 (merge-base) diff
BENCH_NAME=boolean_array
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench boolean_array
BENCH_FILTER=
Results will be posted here when complete

adriangbot · 2026-03-12T22:21:55Z

Benchmark for this request failed.

Last 20 lines of output:

Click to expand

    project_record
    record_batch
    regexp_kernels
    row_format
    row_group_index_reader
    row_selection_cursor
    row_selector
    serde
    sort_kernel
    string_dictionary_builder
    string_run_builder
    string_run_iterator
    substring_kernels
    take_kernels
    union_array
    variant_builder
    variant_kernels
    variant_validation
    view_types
    zip_kernels

adriangbot · 2026-03-12T22:22:12Z

🤖 Arrow criterion benchmark running (GKE) | trigger
Linux bench-c4050498266-175-n8f8t 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux
Comparing add-has-true-has-false (1551756) to 5ba4515 (merge-base) diff
BENCH_NAME=boolean_array
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench boolean_array
BENCH_FILTER=
Results will be posted here when complete

adriangbot · 2026-03-12T22:22:18Z

Benchmark for this request failed.

Last 20 lines of output:

Click to expand

    project_record
    record_batch
    regexp_kernels
    row_format
    row_group_index_reader
    row_selection_cursor
    row_selector
    serde
    sort_kernel
    string_dictionary_builder
    string_run_builder
    string_run_iterator
    substring_kernels
    take_kernels
    union_array
    variant_builder
    variant_kernels
    variant_validation
    view_types
    zip_kernels

adriangbot · 2026-03-12T22:22:40Z

🤖 Arrow criterion benchmark running (GKE) | trigger
Linux bench-c4050518673-174-wtbsf 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux
Comparing add-has-true-has-false (1551756) to 5ba4515 (merge-base) diff
BENCH_NAME=boolean_array
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench boolean_array
BENCH_FILTER=
Results will be posted here when complete

adriangbot · 2026-03-12T22:22:46Z

Benchmark for this request failed.

Last 20 lines of output:

Click to expand

    project_record
    record_batch
    regexp_kernels
    row_format
    row_group_index_reader
    row_selection_cursor
    row_selector
    serde
    sort_kernel
    string_dictionary_builder
    string_run_builder
    string_run_iterator
    substring_kernels
    take_kernels
    union_array
    variant_builder
    variant_kernels
    variant_validation
    view_types
    zip_kernels

adriangbot · 2026-03-12T22:23:21Z

🤖 Arrow criterion benchmark running (GKE) | trigger
Linux bench-c4050498266-175-zqtls 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux
Comparing add-has-true-has-false (1551756) to 5ba4515 (merge-base) diff
BENCH_NAME=boolean_array
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench boolean_array
BENCH_FILTER=
Results will be posted here when complete

adriangbot · 2026-03-12T22:23:27Z

Benchmark for this request failed.

Last 20 lines of output:

Click to expand

    project_record
    record_batch
    regexp_kernels
    row_format
    row_group_index_reader
    row_selection_cursor
    row_selector
    serde
    sort_kernel
    string_dictionary_builder
    string_run_builder
    string_run_iterator
    substring_kernels
    take_kernels
    union_array
    variant_builder
    variant_kernels
    variant_validation
    view_types
    zip_kernels

adriangbot · 2026-03-12T22:23:48Z

🤖 Arrow criterion benchmark running (GKE) | trigger
Linux bench-c4050518673-174-zvqnn 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux
Comparing add-has-true-has-false (1551756) to 5ba4515 (merge-base) diff
BENCH_NAME=boolean_array
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench boolean_array
BENCH_FILTER=
Results will be posted here when complete

adriangbot · 2026-03-12T22:23:54Z

Benchmark for this request failed.

Last 20 lines of output:

Click to expand

    project_record
    record_batch
    regexp_kernels
    row_format
    row_group_index_reader
    row_selection_cursor
    row_selector
    serde
    sort_kernel
    string_dictionary_builder
    string_run_builder
    string_run_iterator
    substring_kernels
    take_kernels
    union_array
    variant_builder
    variant_kernels
    variant_validation
    view_types
    zip_kernels

adriangb · 2026-03-12T23:59:22Z

run benchmark record_batch

adriangbot · 2026-03-12T23:59:24Z

Benchmark job started for this request (job bench-c4051194027-185). Results will be posted here when complete.

adriangbot · 2026-03-13T00:02:16Z

🤖 Arrow criterion benchmark running (GKE) | trigger
Linux bench-c4051194027-185-9kzpt 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux
Comparing add-has-true-has-false (1551756) to 5ba4515 (merge-base) diff
BENCH_NAME=record_batch
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench record_batch
BENCH_FILTER=
Results will be posted here when complete

adriangbot · 2026-03-13T00:05:28Z

🤖 Arrow criterion benchmark completed (GKE) | trigger

Details

group                            add-has-true-has-false                 main
-----                            ----------------------                 ----
project/1000x8192 -> 1x8192      1.00    155.6±1.37ns        ? ?/sec    1.00    155.0±1.39ns        ? ?/sec
project/1000x8192 -> 500x8192    1.00     13.7±0.04µs        ? ?/sec    1.00     13.6±0.04µs        ? ?/sec
project/1000x8192 -> 999x8192    1.00     27.0±0.09µs        ? ?/sec    1.00     27.0±0.09µs        ? ?/sec
project/100x8192 -> 1x8192       1.01    156.0±1.40ns        ? ?/sec    1.00    154.9±1.41ns        ? ?/sec
project/100x8192 -> 50x8192      1.00  1754.6±11.12ns        ? ?/sec    1.00  1757.2±13.68ns        ? ?/sec
project/100x8192 -> 99x8192      1.00      3.1±0.01µs        ? ?/sec    1.01      3.1±0.02µs        ? ?/sec
project/10x8192 -> 1x8192        1.03    166.4±5.97ns        ? ?/sec    1.00    160.8±1.20ns        ? ?/sec
project/10x8192 -> 5x8192        1.02    344.8±4.25ns        ? ?/sec    1.00    338.9±3.90ns        ? ?/sec
project/10x8192 -> 9x8192        1.01    501.4±5.25ns        ? ?/sec    1.00    497.0±2.77ns        ? ?/sec

Resource Usage

base (merge-base)

Metric	Value
Wall time	88.5s
Peak memory	2.0 GiB
Avg memory	1.9 GiB
CPU user	87.5s
CPU sys	0.8s
Disk read	0 B
Disk write	1.2 GiB

branch

Metric	Value
Wall time	87.4s
Peak memory	2.0 GiB
Avg memory	1.9 GiB
CPU user	87.2s
CPU sys	0.2s
Disk read	0 B
Disk write	29.5 MiB

Dandandan · 2026-03-17T07:16:06Z

arrow-array/src/array/boolean_array.rs

+                null_chunks.zip(value_chunks).any(|(n, v)| (n & v) != 0)
+            }
+            None => {
+                let bit_chunks = UnalignedBitChunk::new(


Shouldn't you be able to use BitChunkIterator here?

Dandandan · 2026-03-17T07:17:42Z

This is really nice @adriangb I think as a next step let's just apply them on all of the various .null_count() == 0 patterns in arrow-rs as well (and DataFusion).

I think in DataFusion this has some non-trivial impact on Case / filter eval :) .

I added one comment about BitChunkIterator - I think that might even make it a tiny bit faster (avoiding the shifting) - also for the 64 element case?

adriangb · 2026-03-17T07:18:09Z

I added one comment about BitChunkIterator - I think that might even make it a tiny bit faster (avoiding the shifting) - also for the 64 element case?

Will try tomorrow 😄

alamb · 2026-03-17T18:20:04Z

I took a look at this PR and its performance and it seems to me like it is a good new API. Thank you @adriangb and @Dandandan Let's plan on adding in @Dandandan suggestions as follow on PRs. Perhaps we can open issues to track them so they don't get lost / others can help

It is interesting that putting in a control flow / branch in the loop is actually faster than powering through

alamb

🚀

alamb · 2026-03-17T18:21:26Z

arrow-array/src/array/boolean_array.rs

+    /// as soon as a `true` value is found, without counting all set bits.
+    ///
+    /// Null values are not counted as `true`. Returns `false` for empty arrays.
+    pub fn has_true(&self) -> bool {


We may want to add this API to BooleanBuffer as well (as a follow on PR)

adriangb · 2026-03-17T18:22:14Z

It is interesting that putting in a control flow / branch in the loop is actually faster than powering through

The observation is that this is the same as DataFusion w/ RecordBatch: you need enough data to fully saturate SIMD, branch predition, etc. but it really doesn't hurt to pause every ~ large chunk to make decisions.

alamb · 2026-03-17T18:22:59Z

BTW codex says it found a bug in this PR -- I am getting it to cough up a reproducer now (or will determine it is hallucinating)

arrow-array/src/array/boolean_array.rs

alamb

There is a bug I think -- see comments and reproducer

When the buffer is 8-byte aligned and >16 bytes, UnalignedBitChunk produces a suffix but no prefix. The wildcard match arm set suffix_fill to 0, so trailing padding bits (zeroed by UnalignedBitChunk) appeared as false values. Add explicit (None, Some(_)) arm to fill trailing padding with 1s. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Extract CHUNK_FOLD_BLOCK_SIZE constant and unaligned_bit_chunks() helper to reduce duplication between has_true() and has_false(). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

adriangb · 2026-03-17T19:38:07Z

run benchmark boolean_array

adriangb · 2026-03-17T19:39:29Z

I added one comment about BitChunkIterator - I think that might even make it a tiny bit faster (avoiding the shifting) - also for the 64 element case?

Will try tomorrow 😄

@Dandandan I tried this and the TLDR is the code is simpler and it is faster for small inputs because it saves the overhead of constructing an UnalignedBitChunk but it gives up vectorization so for large inputs it's slower. I've gone with a heuristic adaptive approach that should get us ~ best of both worlds.

adriangbot · 2026-03-17T19:41:11Z

🤖 Arrow criterion benchmark running (GKE) | trigger
Linux bench-c4077544781-387-hcdmn 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux
Comparing add-has-true-has-false (aae3ac5) to a8fe8b3 (merge-base) diff
BENCH_NAME=boolean_array
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench boolean_array
BENCH_FILTER=
Results will be posted here when complete

adriangb · 2026-03-17T19:45:33Z

There is a bug I think -- see comments and reproducer

Thank you, I added the tests and fixed the bug.

adriangbot · 2026-03-17T19:46:42Z

🤖 Arrow criterion benchmark completed (GKE) | trigger

New benchmark — branch-only results (no baseline comparison)

Details

group                                add-has-true-has-false
-----                                ----------------------
has_false(all_false, 1024)           1.00      8.5±0.03ns        ? ?/sec
has_false(all_false, 64)             1.00      8.2±0.07ns        ? ?/sec
has_false(all_false, 65536)          1.00     13.5±0.03ns        ? ?/sec
has_false(all_true, 1024)            1.00      8.5±0.03ns        ? ?/sec
has_false(all_true, 64)              1.00      8.4±0.10ns        ? ?/sec
has_false(all_true, 65536)           1.00    122.3±0.97ns        ? ?/sec
has_false(mixed_early, 1024)         1.00      8.5±0.04ns        ? ?/sec
has_false(mixed_early, 64)           1.00      8.2±0.08ns        ? ?/sec
has_false(mixed_early, 65536)        1.00     13.7±0.03ns        ? ?/sec
has_false(nulls_all_true, 1024)      1.00     21.4±3.06ns        ? ?/sec
has_false(nulls_all_true, 64)        1.00      7.4±1.19ns        ? ?/sec
has_false(nulls_all_true, 65536)     1.00    944.1±4.35ns        ? ?/sec
has_true(all_false, 1024)            1.00      8.4±0.03ns        ? ?/sec
has_true(all_false, 64)              1.00      8.2±0.07ns        ? ?/sec
has_true(all_false, 65536)           1.00    121.2±0.40ns        ? ?/sec
has_true(all_true, 1024)             1.00      8.4±0.02ns        ? ?/sec
has_true(all_true, 64)               1.00      7.8±0.08ns        ? ?/sec
has_true(all_true, 65536)            1.00     13.7±0.16ns        ? ?/sec
has_true(nulls_all_true, 1024)       1.00      7.0±1.38ns        ? ?/sec
has_true(nulls_all_true, 64)         1.00      7.0±1.33ns        ? ?/sec
has_true(nulls_all_true, 65536)      1.00      7.0±1.35ns        ? ?/sec
true_count(all_false, 1024)          1.00      9.9±0.03ns        ? ?/sec
true_count(all_false, 64)            1.00      8.8±0.29ns        ? ?/sec
true_count(all_false, 65536)         1.00    228.0±0.08ns        ? ?/sec
true_count(all_true, 1024)           1.00     10.1±0.18ns        ? ?/sec
true_count(all_true, 64)             1.00      8.8±0.28ns        ? ?/sec
true_count(all_true, 65536)          1.00    228.0±0.16ns        ? ?/sec
true_count(mixed_early, 1024)        1.00     10.0±0.03ns        ? ?/sec
true_count(mixed_early, 64)          1.00      8.8±0.29ns        ? ?/sec
true_count(mixed_early, 65536)       1.00    228.0±0.10ns        ? ?/sec
true_count(nulls_all_true, 1024)     1.00     21.4±3.73ns        ? ?/sec
true_count(nulls_all_true, 64)       1.00     11.1±5.55ns        ? ?/sec
true_count(nulls_all_true, 65536)    1.00    720.3±2.81ns        ? ?/sec

Resource Usage

branch

Metric	Value
Wall time	320.4s
Peak memory	1.7 GiB
Avg memory	1.6 GiB
CPU user	319.2s
CPU sys	1.0s
Disk read	0 B
Disk write	1.0 GiB

alamb

🙏 thank you

adriangb · 2026-03-17T20:01:51Z

I'll wait for another look from @Dandandan since I've changed the implementation before merging. Thank you both for review.

alamb · 2026-03-17T20:25:31Z

👨‍🍳 👌

Dandandan · 2026-03-18T05:11:27Z

arrow-array/src/array/boolean_array.rs

+                bit_chunks.prefix().unwrap_or(0) != 0
+                    || bit_chunks
+                        .chunks()
+                        .chunks(Self::CHUNK_FOLD_BLOCK_SIZE)


With chunks_exact you could probably use a smaller constant (as you can remove the inner branch / loop with a unrolled loop).

(So I think it could terminate even earlier with a smaller constant - as it only has a "termination" branch after the e.g. 8 elements instead of also having a loop end)

good call! #9570

Dandandan · 2026-03-18T05:29:27Z

Probably there is room for a bit improvement, but performance is already great!

Dandandan · 2026-03-18T05:31:31Z

Thank you @adriangb !

adriangbot · 2026-03-18T06:25:26Z

Hi @adriangb, your benchmark configuration could not be parsed (#9511 (comment)).

Error: invalid configuration: found character that cannot start any token at line 2 column 1, while scanning for the next token

Supported benchmarks:

Standard: (none)
Criterion: (any)

Usage:

run benchmark <name>           # run specific benchmark(s)
run benchmarks                 # run default suite
run benchmarks <name1> <name2> # run specific benchmarks

Per-side configuration (run benchmark tpch followed by):

env:
SHARED_SETTING: enabled
baseline:
ref: v45.0.0
env:
DATAFUSION_RUNTIME_MEMORY_LIMIT: 1G
changed:
ref: v46.0.0
env:
DATAFUSION_RUNTIME_MEMORY_LIMIT: 2G

@Dandandan

…9570) ## Summary - Replace `.chunks(64)` with `.chunks_exact(16)` in `has_true()` and `has_false()` as suggested in #9511 (comment) - With `chunks_exact`, the compiler can fully unroll the inner fold (guaranteed size, no inner branch/loop), allowing a smaller block size for more frequent short-circuit exits without regressing the full-scan path ## Benchmark results (block size 16 vs baseline) - Full-scan worst case (65536): No regression (~49ns both) - Early-exit cases (65536): ~27% faster (6.0ns → 4.4ns) - Small arrays (64, 1024): Unchanged ## Test plan - [x] All 13 existing `test_has` tests pass run benchmarks boolean_array @Dandandan Would appreciate your review! 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions bot added the arrow Changes to the arrow crate label Mar 5, 2026

update docstring

4e8b072

adriangb force-pushed the add-has-true-has-false branch from 1551756 to 4e8b072 Compare March 17, 2026 07:06

Dandandan reviewed Mar 17, 2026

View reviewed changes

alamb reviewed Mar 17, 2026

View reviewed changes

arrow-array/src/array/boolean_array.rs Show resolved Hide resolved

alamb requested changes Mar 17, 2026

View reviewed changes

adriangb and others added 2 commits March 17, 2026 14:06

Extract shared constant and helper in has_true/has_false

aae3ac5

Extract CHUNK_FOLD_BLOCK_SIZE constant and unaligned_bit_chunks() helper to reduce duplication between has_true() and has_false(). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

add tests

b6d5325

alamb approved these changes Mar 17, 2026

View reviewed changes

Dandandan reviewed Mar 18, 2026

View reviewed changes

Dandandan approved these changes Mar 18, 2026

View reviewed changes

Dandandan merged commit d426107 into apache:main Mar 18, 2026
27 checks passed

adriangb mentioned this pull request Mar 18, 2026

Use chunks_exact for has_true/has_false to enable compiler unrolling #9570

Merged

1 task

Conversation

adriangb commented Mar 5, 2026

Motivation

Callsites in DataFusion

Benchmark Results

Implementation

Alternatives considered

Test Plan

Uh oh!

adriangb commented Mar 5, 2026

Uh oh!

alamb-ghbot commented Mar 5, 2026

Uh oh!

adriangb commented Mar 5, 2026

Uh oh!

adriangb commented Mar 12, 2026

Uh oh!

alamb-ghbot commented Mar 12, 2026

Uh oh!

adriangb commented Mar 12, 2026

Uh oh!

alamb-ghbot commented Mar 12, 2026

Uh oh!

adriangbot commented Mar 12, 2026

Uh oh!

adriangbot commented Mar 12, 2026

Uh oh!

adriangbot commented Mar 12, 2026

Uh oh!

adriangbot commented Mar 12, 2026

Uh oh!

adriangbot commented Mar 12, 2026

Uh oh!

adriangbot commented Mar 12, 2026

Uh oh!

adriangbot commented Mar 12, 2026

Uh oh!

adriangbot commented Mar 12, 2026

Uh oh!

adriangbot commented Mar 12, 2026

Uh oh!

adriangbot commented Mar 12, 2026

Uh oh!

adriangbot commented Mar 12, 2026

Uh oh!

adriangbot commented Mar 12, 2026

Uh oh!

adriangbot commented Mar 12, 2026

Uh oh!

adriangbot commented Mar 12, 2026

Uh oh!

adriangbot commented Mar 12, 2026

Uh oh!

adriangbot commented Mar 12, 2026

Uh oh!

adriangbot commented Mar 12, 2026

Uh oh!

adriangbot commented Mar 12, 2026

Uh oh!

adriangb commented Mar 12, 2026

Uh oh!

adriangbot commented Mar 12, 2026

Uh oh!

adriangbot commented Mar 13, 2026

Uh oh!

adriangbot commented Mar 13, 2026

Uh oh!

Dandandan Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

Dandandan commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adriangb commented Mar 17, 2026

Uh oh!

alamb commented Mar 17, 2026

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Dandandan commented Mar 17, 2026 •

edited

Loading