Skip to content

Add has_true() and has_false() to BooleanArray#9511

Merged
Dandandan merged 8 commits intoapache:mainfrom
pydantic:add-has-true-has-false
Mar 18, 2026
Merged

Add has_true() and has_false() to BooleanArray#9511
Dandandan merged 8 commits intoapache:mainfrom
pydantic:add-has-true-has-false

Conversation

@adriangb
Copy link
Contributor

@adriangb adriangb commented Mar 5, 2026

Motivation

When working with BooleanArray, a common pattern is checking whether any true or false value exists — e.g.
arr.true_count() > 0 or arr.false_count() == 0. This currently requires true_count() / false_count(), which scan the entire bitmap to count every set bit (via popcount), even though we only need to know if at least one exists.

This PR adds has_true() and has_false() methods that short-circuit as soon as they find a matching value, providing both:

  1. Better performance — faster on large arrays in the best case
  2. More ergonomic APIarr.has_true() expresses intent more clearly than arr.true_count() > 0

Callsites in DataFusion

There are several places in DataFusion that would benefit from these methods:

  • datafusion/functions-nested/src/array_has.rseq_array.true_count() > 0eq_array.has_true()
  • datafusion/physical-plan/src/topk/mod.rsfilter.true_count() == 0 check → !filter.has_true()
  • datafusion/datasource-parquet/src/metadata.rsexactness.true_count() == 0 and combined_mask.true_count() > 0
  • datafusion/physical-plan/src/joins/nested_loop_join.rsbitmap.true_count() == 0 checks
  • datafusion/physical-expr-common/src/physical_expr.rsselection_count == 0 from selection.true_count()
  • datafusion/physical-expr/src/expressions/binary.rs — short-circuit checks for AND/OR

Benchmark Results

Scenario                          true_count     has_true       has_false      Speedup (best)
─────────────────────────────────────────────────────────────────────────────────────────────
all_true, 64                      4.32 ns        4.08 ns        4.76 ns        ~1.1x
all_false, 64                     4.30 ns        4.25 ns        4.52 ns        ~1.0x
all_true, 1024                    5.15 ns        4.52 ns        4.99 ns        ~1.1x
all_false, 1024                   5.17 ns        4.55 ns        5.00 ns        ~1.1x
mixed_early, 1024                 5.22 ns        —              5.04 ns        ~1.0x
nulls_all_true, 1024              12.84 ns       4.10 ns        12.92 ns       ~3.1x
all_true, 65536                   100.06 ns      5.96 ns        49.70 ns       ~16.8x (has_true)
all_false, 65536                  99.33 ns       49.30 ns       6.19 ns        ~16.0x (has_false)
mixed_early, 65536                100.10 ns      —              6.20 ns        ~16.1x (has_false)
nulls_all_true, 65536             522.94 ns      4.05 ns        521.82 ns      ~129x (has_true)

The key wins are on larger arrays (65,536 elements), where has_true/has_false are up to 16-129x faster than
true_count() in best-case scenarios (early short-circuit). Even in worst case (must scan entire array), performance is
comparable to true_count.

Implementation

The implementation processes bits in 64-bit chunks using UnalignedBitChunk, which handles arbitrary bit offsets and aligns
data for SIMD-friendly processing.

  • has_true (no nulls): OR-folds 64-bit chunks, short-circuits when any bit is set
  • has_false (no nulls): AND-folds 64-bit chunks, short-circuits when any bit is unset (with padding bits masked to 1)
  • With nulls: Iterates paired (null, value) chunks, checking null & value != 0 (has_true) or null & !value != 0
    (has_false)

Alternatives considered

  1. Fully vectorized (no early stopping): Would process the entire bitmap like true_count() but with simpler bitwise ops
    instead of popcount. Marginally faster than true_count() but misses the main optimization opportunity.
  2. Per-element iteration with early stopping: self.iter().any(|v| v == Some(true)). Simple but processes one bit at a
    time, missing SIMD vectorization of the inner loop. Our approach processes 64 bits at a time while still supporting early
    exit.

The chosen approach balances SIMD-friendly bulk processing (64 bits per iteration) with early termination, giving the best of
both worlds.

Test Plan

  • Unit tests covering: all-true, all-false, mixed, empty, nullable (all-valid-true, all-valid-false, all-null), non-aligned
    lengths (65 elements, 64+1 with trailing false)
  • Criterion benchmarks comparing has_true/has_false vs true_count across sizes (64, 1024, 65536) and data distributions

🤖 Generated with [Claude Code](https://claude.com/claude-code

@github-actions github-actions bot added the arrow Changes to the arrow crate label Mar 5, 2026
@adriangb
Copy link
Contributor Author

adriangb commented Mar 5, 2026

run benchmark boolean_array

@alamb-ghbot
Copy link

🤖 Hi @adriangb, thanks for the request (#9511 (comment)).

scrape_comments.py only supports whitelisted benchmarks.

  • Standard: (none)
  • Criterion: array_from, array_iter, arrow_reader, arrow_reader_clickbench, arrow_reader_row_filter, arrow_statistics, arrow_writer, bitwise_kernel, boolean_kernels, buffer_bit_ops, builder, cast_kernels, coalesce_kernels, comparison_kernels, concatenate_kernel, csv_writer, encoding, filter_kernels, interleave_kernels, ipc_reader, json-reader, metadata, row_format, sort_kernel, take_kernels, union_array, variant_builder, variant_kernels, variant_validation, view_types, zip_kernels

Please choose one or more of these with run benchmark <name> or run benchmark <name1> <name2>...

You can also set environment variables on subsequent lines:

run benchmark tpch_mem
DATAFUSION_RUNTIME_MEMORY_LIMIT=1G

Unsupported benchmarks: boolean_array.

@adriangb
Copy link
Contributor Author

adriangb commented Mar 5, 2026

cc @Dandandan

@adriangb
Copy link
Contributor Author

run benchmark boolean_array

@alamb-ghbot
Copy link

🤖 Hi @adriangb, thanks for the request (#9511 (comment)).

scrape_comments.py only supports whitelisted benchmarks.

  • Standard: (none)
  • Criterion: array_from, array_iter, arrow_reader, arrow_reader_clickbench, arrow_reader_row_filter, arrow_statistics, arrow_writer, bitwise_kernel, boolean_kernels, buffer_bit_ops, builder, cast_kernels, coalesce_kernels, comparison_kernels, concatenate_kernel, csv_writer, encoding, filter_kernels, interleave_kernels, ipc_reader, json-reader, json_reader, metadata, row_format, sort_kernel, take_kernels, union_array, variant_builder, variant_kernels, variant_validation, view_types, zip_kernels

Please choose one or more of these with run benchmark <name> or run benchmark <name1> <name2>...

You can also set environment variables on subsequent lines:

run benchmark tpch_mem
DATAFUSION_RUNTIME_MEMORY_LIMIT=1G

Unsupported benchmarks: boolean_array.

@adriangb
Copy link
Contributor Author

run benchmark boolean_array

@alamb-ghbot
Copy link

🤖 Hi @adriangb, thanks for the request (#9511 (comment)).

scrape_comments.py only supports whitelisted benchmarks.

  • Standard: (none)
  • Criterion: array_from, array_iter, arrow_reader, arrow_reader_clickbench, arrow_reader_row_filter, arrow_statistics, arrow_writer, bitwise_kernel, boolean_kernels, buffer_bit_ops, builder, cast_kernels, coalesce_kernels, comparison_kernels, concatenate_kernel, csv_writer, encoding, filter_kernels, interleave_kernels, ipc_reader, json-reader, json_reader, metadata, row_format, sort_kernel, take_kernels, union_array, variant_builder, variant_kernels, variant_validation, view_types, zip_kernels

Please choose one or more of these with run benchmark <name> or run benchmark <name1> <name2>...

You can also set environment variables on subsequent lines:

run benchmark tpch_mem
DATAFUSION_RUNTIME_MEMORY_LIMIT=1G

Unsupported benchmarks: boolean_array.

@adriangbot
Copy link

Benchmark job started for this request (job bench-c4050518673-174). Results will be posted here when complete.

@adriangbot
Copy link

Benchmark job started for this request (job bench-c4050498266-175). Results will be posted here when complete.

@adriangbot
Copy link

🤖 Arrow criterion benchmark running (GKE) | trigger
Linux bench-c4050498266-175-6hrsh 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux
Comparing add-has-true-has-false (1551756) to 5ba4515 (merge-base) diff
BENCH_NAME=boolean_array
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench boolean_array
BENCH_FILTER=
Results will be posted here when complete

@adriangbot
Copy link

Benchmark for this request failed.

Last 20 lines of output:

Click to expand
    project_record
    record_batch
    regexp_kernels
    row_format
    row_group_index_reader
    row_selection_cursor
    row_selector
    serde
    sort_kernel
    string_dictionary_builder
    string_run_builder
    string_run_iterator
    substring_kernels
    take_kernels
    union_array
    variant_builder
    variant_kernels
    variant_validation
    view_types
    zip_kernels

@adriangbot
Copy link

🤖 Arrow criterion benchmark running (GKE) | trigger
Linux bench-c4050518673-174-fxsf5 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux
Comparing add-has-true-has-false (1551756) to 5ba4515 (merge-base) diff
BENCH_NAME=boolean_array
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench boolean_array
BENCH_FILTER=
Results will be posted here when complete

@adriangbot
Copy link

Benchmark for this request failed.

Last 20 lines of output:

Click to expand
    project_record
    record_batch
    regexp_kernels
    row_format
    row_group_index_reader
    row_selection_cursor
    row_selector
    serde
    sort_kernel
    string_dictionary_builder
    string_run_builder
    string_run_iterator
    substring_kernels
    take_kernels
    union_array
    variant_builder
    variant_kernels
    variant_validation
    view_types
    zip_kernels

@adriangbot
Copy link

🤖 Arrow criterion benchmark running (GKE) | trigger
Linux bench-c4050498266-175-lwtcv 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux
Comparing add-has-true-has-false (1551756) to 5ba4515 (merge-base) diff
BENCH_NAME=boolean_array
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench boolean_array
BENCH_FILTER=
Results will be posted here when complete

@adriangbot
Copy link

Benchmark for this request failed.

Last 20 lines of output:

Click to expand
    project_record
    record_batch
    regexp_kernels
    row_format
    row_group_index_reader
    row_selection_cursor
    row_selector
    serde
    sort_kernel
    string_dictionary_builder
    string_run_builder
    string_run_iterator
    substring_kernels
    take_kernels
    union_array
    variant_builder
    variant_kernels
    variant_validation
    view_types
    zip_kernels

@adriangbot
Copy link

🤖 Arrow criterion benchmark running (GKE) | trigger
Linux bench-c4050518673-174-wkwzs 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux
Comparing add-has-true-has-false (1551756) to 5ba4515 (merge-base) diff
BENCH_NAME=boolean_array
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench boolean_array
BENCH_FILTER=
Results will be posted here when complete

@adriangbot
Copy link

Benchmark for this request failed.

Last 20 lines of output:

Click to expand
    project_record
    record_batch
    regexp_kernels
    row_format
    row_group_index_reader
    row_selection_cursor
    row_selector
    serde
    sort_kernel
    string_dictionary_builder
    string_run_builder
    string_run_iterator
    substring_kernels
    take_kernels
    union_array
    variant_builder
    variant_kernels
    variant_validation
    view_types
    zip_kernels

@adriangbot
Copy link

🤖 Arrow criterion benchmark running (GKE) | trigger
Linux bench-c4050498266-175-n8f8t 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux
Comparing add-has-true-has-false (1551756) to 5ba4515 (merge-base) diff
BENCH_NAME=boolean_array
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench boolean_array
BENCH_FILTER=
Results will be posted here when complete

@adriangbot
Copy link

Benchmark for this request failed.

Last 20 lines of output:

Click to expand
    project_record
    record_batch
    regexp_kernels
    row_format
    row_group_index_reader
    row_selection_cursor
    row_selector
    serde
    sort_kernel
    string_dictionary_builder
    string_run_builder
    string_run_iterator
    substring_kernels
    take_kernels
    union_array
    variant_builder
    variant_kernels
    variant_validation
    view_types
    zip_kernels

@adriangbot
Copy link

🤖 Arrow criterion benchmark running (GKE) | trigger
Linux bench-c4050518673-174-wtbsf 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux
Comparing add-has-true-has-false (1551756) to 5ba4515 (merge-base) diff
BENCH_NAME=boolean_array
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench boolean_array
BENCH_FILTER=
Results will be posted here when complete

@adriangbot
Copy link

Benchmark for this request failed.

Last 20 lines of output:

Click to expand
    project_record
    record_batch
    regexp_kernels
    row_format
    row_group_index_reader
    row_selection_cursor
    row_selector
    serde
    sort_kernel
    string_dictionary_builder
    string_run_builder
    string_run_iterator
    substring_kernels
    take_kernels
    union_array
    variant_builder
    variant_kernels
    variant_validation
    view_types
    zip_kernels

@adriangbot
Copy link

🤖 Arrow criterion benchmark running (GKE) | trigger
Linux bench-c4050498266-175-zqtls 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux
Comparing add-has-true-has-false (1551756) to 5ba4515 (merge-base) diff
BENCH_NAME=boolean_array
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench boolean_array
BENCH_FILTER=
Results will be posted here when complete

@adriangbot
Copy link

Benchmark for this request failed.

Last 20 lines of output:

Click to expand
    project_record
    record_batch
    regexp_kernels
    row_format
    row_group_index_reader
    row_selection_cursor
    row_selector
    serde
    sort_kernel
    string_dictionary_builder
    string_run_builder
    string_run_iterator
    substring_kernels
    take_kernels
    union_array
    variant_builder
    variant_kernels
    variant_validation
    view_types
    zip_kernels

@adriangbot
Copy link

🤖 Arrow criterion benchmark running (GKE) | trigger
Linux bench-c4050518673-174-zvqnn 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux
Comparing add-has-true-has-false (1551756) to 5ba4515 (merge-base) diff
BENCH_NAME=boolean_array
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench boolean_array
BENCH_FILTER=
Results will be posted here when complete

@adriangbot
Copy link

Benchmark for this request failed.

Last 20 lines of output:

Click to expand
    project_record
    record_batch
    regexp_kernels
    row_format
    row_group_index_reader
    row_selection_cursor
    row_selector
    serde
    sort_kernel
    string_dictionary_builder
    string_run_builder
    string_run_iterator
    substring_kernels
    take_kernels
    union_array
    variant_builder
    variant_kernels
    variant_validation
    view_types
    zip_kernels

@adriangb
Copy link
Contributor Author

run benchmark record_batch

@adriangbot
Copy link

Benchmark job started for this request (job bench-c4051194027-185). Results will be posted here when complete.

@adriangbot
Copy link

🤖 Arrow criterion benchmark running (GKE) | trigger
Linux bench-c4051194027-185-9kzpt 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux
Comparing add-has-true-has-false (1551756) to 5ba4515 (merge-base) diff
BENCH_NAME=record_batch
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench record_batch
BENCH_FILTER=
Results will be posted here when complete

@adriangbot
Copy link

🤖 Arrow criterion benchmark completed (GKE) | trigger

Details

group                            add-has-true-has-false                 main
-----                            ----------------------                 ----
project/1000x8192 -> 1x8192      1.00    155.6±1.37ns        ? ?/sec    1.00    155.0±1.39ns        ? ?/sec
project/1000x8192 -> 500x8192    1.00     13.7±0.04µs        ? ?/sec    1.00     13.6±0.04µs        ? ?/sec
project/1000x8192 -> 999x8192    1.00     27.0±0.09µs        ? ?/sec    1.00     27.0±0.09µs        ? ?/sec
project/100x8192 -> 1x8192       1.01    156.0±1.40ns        ? ?/sec    1.00    154.9±1.41ns        ? ?/sec
project/100x8192 -> 50x8192      1.00  1754.6±11.12ns        ? ?/sec    1.00  1757.2±13.68ns        ? ?/sec
project/100x8192 -> 99x8192      1.00      3.1±0.01µs        ? ?/sec    1.01      3.1±0.02µs        ? ?/sec
project/10x8192 -> 1x8192        1.03    166.4±5.97ns        ? ?/sec    1.00    160.8±1.20ns        ? ?/sec
project/10x8192 -> 5x8192        1.02    344.8±4.25ns        ? ?/sec    1.00    338.9±3.90ns        ? ?/sec
project/10x8192 -> 9x8192        1.01    501.4±5.25ns        ? ?/sec    1.00    497.0±2.77ns        ? ?/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 88.5s
Peak memory 2.0 GiB
Avg memory 1.9 GiB
CPU user 87.5s
CPU sys 0.8s
Disk read 0 B
Disk write 1.2 GiB

branch

Metric Value
Wall time 87.4s
Peak memory 2.0 GiB
Avg memory 1.9 GiB
CPU user 87.2s
CPU sys 0.2s
Disk read 0 B
Disk write 29.5 MiB

@adriangb adriangb force-pushed the add-has-true-has-false branch from 1551756 to 4e8b072 Compare March 17, 2026 07:06
null_chunks.zip(value_chunks).any(|(n, v)| (n & v) != 0)
}
None => {
let bit_chunks = UnalignedBitChunk::new(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't you be able to use BitChunkIterator here?

@Dandandan
Copy link
Contributor

Dandandan commented Mar 17, 2026

This is really nice @adriangb I think as a next step let's just apply them on all of the various .null_count() == 0 patterns in arrow-rs as well (and DataFusion).

I think in DataFusion this has some non-trivial impact on Case / filter eval :) .

I added one comment about BitChunkIterator - I think that might even make it a tiny bit faster (avoiding the shifting) - also for the 64 element case?

@adriangb
Copy link
Contributor Author

I added one comment about BitChunkIterator - I think that might even make it a tiny bit faster (avoiding the shifting) - also for the 64 element case?

Will try tomorrow 😄

@alamb
Copy link
Contributor

alamb commented Mar 17, 2026

I took a look at this PR and its performance and it seems to me like it is a good new API. Thank you @adriangb and @Dandandan Let's plan on adding in @Dandandan suggestions as follow on PRs. Perhaps we can open issues to track them so they don't get lost / others can help

It is interesting that putting in a control flow / branch in the loop is actually faster than powering through

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

/// as soon as a `true` value is found, without counting all set bits.
///
/// Null values are not counted as `true`. Returns `false` for empty arrays.
pub fn has_true(&self) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to add this API to BooleanBuffer as well (as a follow on PR)

@adriangb
Copy link
Contributor Author

It is interesting that putting in a control flow / branch in the loop is actually faster than powering through

The observation is that this is the same as DataFusion w/ RecordBatch: you need enough data to fully saturate SIMD, branch predition, etc. but it really doesn't hurt to pause every ~ large chunk to make decisions.

@alamb
Copy link
Contributor

alamb commented Mar 17, 2026

BTW codex says it found a bug in this PR -- I am getting it to cough up a reproducer now (or will determine it is hallucinating)

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a bug I think -- see comments and reproducer

adriangb and others added 2 commits March 17, 2026 14:06
When the buffer is 8-byte aligned and >16 bytes, UnalignedBitChunk
produces a suffix but no prefix. The wildcard match arm set suffix_fill
to 0, so trailing padding bits (zeroed by UnalignedBitChunk) appeared
as false values. Add explicit (None, Some(_)) arm to fill trailing
padding with 1s.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extract CHUNK_FOLD_BLOCK_SIZE constant and unaligned_bit_chunks() helper
to reduce duplication between has_true() and has_false().

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@adriangb
Copy link
Contributor Author

run benchmark boolean_array

@adriangb
Copy link
Contributor Author

I added one comment about BitChunkIterator - I think that might even make it a tiny bit faster (avoiding the shifting) - also for the 64 element case?

Will try tomorrow 😄

@Dandandan I tried this and the TLDR is the code is simpler and it is faster for small inputs because it saves the overhead of constructing an UnalignedBitChunk but it gives up vectorization so for large inputs it's slower. I've gone with a heuristic adaptive approach that should get us ~ best of both worlds.

@adriangbot
Copy link

🤖 Arrow criterion benchmark running (GKE) | trigger
Linux bench-c4077544781-387-hcdmn 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux
Comparing add-has-true-has-false (aae3ac5) to a8fe8b3 (merge-base) diff
BENCH_NAME=boolean_array
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench boolean_array
BENCH_FILTER=
Results will be posted here when complete

@adriangb
Copy link
Contributor Author

There is a bug I think -- see comments and reproducer

Thank you, I added the tests and fixed the bug.

@adriangbot
Copy link

🤖 Arrow criterion benchmark completed (GKE) | trigger

New benchmark — branch-only results (no baseline comparison)

Details

group                                add-has-true-has-false
-----                                ----------------------
has_false(all_false, 1024)           1.00      8.5±0.03ns        ? ?/sec
has_false(all_false, 64)             1.00      8.2±0.07ns        ? ?/sec
has_false(all_false, 65536)          1.00     13.5±0.03ns        ? ?/sec
has_false(all_true, 1024)            1.00      8.5±0.03ns        ? ?/sec
has_false(all_true, 64)              1.00      8.4±0.10ns        ? ?/sec
has_false(all_true, 65536)           1.00    122.3±0.97ns        ? ?/sec
has_false(mixed_early, 1024)         1.00      8.5±0.04ns        ? ?/sec
has_false(mixed_early, 64)           1.00      8.2±0.08ns        ? ?/sec
has_false(mixed_early, 65536)        1.00     13.7±0.03ns        ? ?/sec
has_false(nulls_all_true, 1024)      1.00     21.4±3.06ns        ? ?/sec
has_false(nulls_all_true, 64)        1.00      7.4±1.19ns        ? ?/sec
has_false(nulls_all_true, 65536)     1.00    944.1±4.35ns        ? ?/sec
has_true(all_false, 1024)            1.00      8.4±0.03ns        ? ?/sec
has_true(all_false, 64)              1.00      8.2±0.07ns        ? ?/sec
has_true(all_false, 65536)           1.00    121.2±0.40ns        ? ?/sec
has_true(all_true, 1024)             1.00      8.4±0.02ns        ? ?/sec
has_true(all_true, 64)               1.00      7.8±0.08ns        ? ?/sec
has_true(all_true, 65536)            1.00     13.7±0.16ns        ? ?/sec
has_true(nulls_all_true, 1024)       1.00      7.0±1.38ns        ? ?/sec
has_true(nulls_all_true, 64)         1.00      7.0±1.33ns        ? ?/sec
has_true(nulls_all_true, 65536)      1.00      7.0±1.35ns        ? ?/sec
true_count(all_false, 1024)          1.00      9.9±0.03ns        ? ?/sec
true_count(all_false, 64)            1.00      8.8±0.29ns        ? ?/sec
true_count(all_false, 65536)         1.00    228.0±0.08ns        ? ?/sec
true_count(all_true, 1024)           1.00     10.1±0.18ns        ? ?/sec
true_count(all_true, 64)             1.00      8.8±0.28ns        ? ?/sec
true_count(all_true, 65536)          1.00    228.0±0.16ns        ? ?/sec
true_count(mixed_early, 1024)        1.00     10.0±0.03ns        ? ?/sec
true_count(mixed_early, 64)          1.00      8.8±0.29ns        ? ?/sec
true_count(mixed_early, 65536)       1.00    228.0±0.10ns        ? ?/sec
true_count(nulls_all_true, 1024)     1.00     21.4±3.73ns        ? ?/sec
true_count(nulls_all_true, 64)       1.00     11.1±5.55ns        ? ?/sec
true_count(nulls_all_true, 65536)    1.00    720.3±2.81ns        ? ?/sec

Resource Usage

branch

Metric Value
Wall time 320.4s
Peak memory 1.7 GiB
Avg memory 1.6 GiB
CPU user 319.2s
CPU sys 1.0s
Disk read 0 B
Disk write 1.0 GiB

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙏 thank you

@adriangb
Copy link
Contributor Author

I'll wait for another look from @Dandandan since I've changed the implementation before merging. Thank you both for review.

@alamb
Copy link
Contributor

alamb commented Mar 17, 2026

👨‍🍳 👌

bit_chunks.prefix().unwrap_or(0) != 0
|| bit_chunks
.chunks()
.chunks(Self::CHUNK_FOLD_BLOCK_SIZE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With chunks_exact you could probably use a smaller constant (as you can remove the inner branch / loop with a unrolled loop).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(So I think it could terminate even earlier with a smaller constant - as it only has a "termination" branch after the e.g. 8 elements instead of also having a loop end)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call! #9570

@Dandandan
Copy link
Contributor

Probably there is room for a bit improvement, but performance is already great!

@Dandandan Dandandan merged commit d426107 into apache:main Mar 18, 2026
27 checks passed
@Dandandan
Copy link
Contributor

Thank you @adriangb !

@adriangbot
Copy link

Hi @adriangb, your benchmark configuration could not be parsed (#9511 (comment)).

Error: invalid configuration: found character that cannot start any token at line 2 column 1, while scanning for the next token

Supported benchmarks:

  • Standard: (none)
  • Criterion: (any)

Usage:

run benchmark <name>           # run specific benchmark(s)
run benchmarks                 # run default suite
run benchmarks <name1> <name2> # run specific benchmarks

Per-side configuration (run benchmark tpch followed by):

env:
SHARED_SETTING: enabled
baseline:
ref: v45.0.0
env:
DATAFUSION_RUNTIME_MEMORY_LIMIT: 1G
changed:
ref: v46.0.0
env:
DATAFUSION_RUNTIME_MEMORY_LIMIT: 2G

Dandandan pushed a commit that referenced this pull request Mar 18, 2026
…9570)

## Summary
- Replace `.chunks(64)` with `.chunks_exact(16)` in `has_true()` and
`has_false()` as suggested in
#9511 (comment)
- With `chunks_exact`, the compiler can fully unroll the inner fold
(guaranteed size, no inner branch/loop), allowing a smaller block size
for more frequent short-circuit exits without regressing the full-scan
path

## Benchmark results (block size 16 vs baseline)
- Full-scan worst case (65536): No regression (~49ns both)
- Early-exit cases (65536): ~27% faster (6.0ns → 4.4ns)
- Small arrays (64, 1024): Unchanged

## Test plan
- [x] All 13 existing `test_has` tests pass

run benchmarks boolean_array

@Dandandan Would appreciate your review!

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants