Skip to content

perf(parquet): optimize stats and bloom filters for bool columns#715

Open
zeroshade wants to merge 10 commits intoapache:mainfrom
zeroshade:perf/bool-bitmap-stats-bloom
Open

perf(parquet): optimize stats and bloom filters for bool columns#715
zeroshade wants to merge 10 commits intoapache:mainfrom
zeroshade:perf/bool-bitmap-stats-bloom

Conversation

@zeroshade
Copy link
Member

Rationale for this change

The initial bool optimization (#707) added direct bitmap encoding with WriteBitmapBatch and WriteBitmapBatchSpaced functions to reduce allocations during encoding and decoding. However, this didn't implement the same optimization for statistics updated, bloom filter inserts or spaced encoding.

What changes are included in this PR?

  1. Update the bloom filter handling to compute hashes directly from bitmaps, including for spaced values and reuse slice allocation across iterations.
  2. Update statistics handling to update min/max directly from bitmaps for spaced and non-spaced scenarios.
  3. Update the encoder interface for the boolean encoder to have a PutSpacedBitmap and compress bitmaps with validity buffers.
  4. Add writeBitmapValues and writeBitmapValuesSpaced for the boolean column writer with fallback to []bool conversions if the encoder doesn't implement the interface.

Are these changes tested?

Yes, unit tests are added for everything

Are there any user-facing changes?

No user-facing API changes, this is a pure internal optimization change for boolean columns.

Benchmark Results

Statistics Update

  BEFORE (with bitmap → []bool conversion):
  BenchmarkBooleanStatisticsWithConversion-16
    153,398 ns/op    109,278 B/op (107 KB)    6 allocs/op

  AFTER (direct bitmap operations):
  BenchmarkBooleanStatisticsDirectBitmap-16
        393 ns/op      2,698 B/op (2.6 KB)    5 allocs/op
  • 390x faster
  • 97.5% less memory
  • 1 fewer allocation

Bloom filter hashing

  BEFORE (with bitmap → []bool conversion):
  BenchmarkBloomFilterHashingWithConversion-16
    1,084,001 ns/op    3,309,593 B/op (3.23 MB)    3 allocs/op

  AFTER (direct bitmap operations):
  BenchmarkBloomFilterHashingDirectBitmap-16
      448,882 ns/op      802,821 B/op (784 KB)    2 allocs/op
  • 2.4x faster
  • 76% less memory
  • 2.5 MB saved/operation
  • 1 fewer allocation

Full write path (Stats + Bloom Filter)

  BEFORE (with bitmap → []bool conversion):
  BenchmarkFullWritePathWithConversion-16
    1,211,525 ns/op    3,315,566 B/op (3.24 MB)    15 allocs/op

  AFTER (direct bitmap operations):
  BenchmarkFullWritePathDirectBitmap-16
      580,934 ns/op      807,640 B/op (789 KB)    13 allocs/op
  • 2.1x faster
  • 76% less memory
  • 2.5 MB saved per 100k bools written
  • 2 fewer allocations

Matt and others added 8 commits March 15, 2026 18:39
Add bitmap-aware methods for statistics and bloom filters to avoid
converting back to []bool during boolean column writes.

Changes:
- Add GetHashesFromBitmap() to compute bloom filter hashes directly
  from bitmap without conversion
- Add UpdateFromBitmap() and UpdateFromBitmapSpaced() methods to
  BooleanStatistics for bitmap-aware min/max computation
- Update writeBitmapValues() and writeBitmapValuesSpaced() to use
  new bitmap-aware methods

Performance improvement:
- Non-spaced writes: Eliminate 2 []bool conversions (statistics + bloom)
- Spaced writes: Eliminate 1 []bool conversion (statistics)
- Reduces memory allocations and avoids 8x memory overhead of []bool

The optimization maintains full backward compatibility and all tests pass.
Add extensive test coverage for the new bitmap-aware statistics and
bloom filter methods:

Bloom Filter Tests (TestGetHashesFromBitmap):
- Empty bitmap handling
- Aligned and unaligned offsets
- All true/false bits
- Large bitmap performance
- Consistency verification with GetHash()

Statistics Tests (TestBooleanStatisticsUpdateFromBitmap):
- All true/false bits scenarios
- Mixed true and false values
- Unaligned offset handling
- Null counting
- Multiple updates accumulation
- Early exit optimization verification
- Consistency with existing Update() method

Spaced Statistics Tests (TestBooleanStatisticsUpdateFromBitmapSpaced):
- All valid bits
- Partial null values
- All null values
- Unaligned offsets
- Only true/false values valid
- Multiple updates
- Consistency with existing UpdateSpaced() method

All tests pass, verifying correctness of the bitmap optimization.
Add GetSpacedHashesFromBitmap() to eliminate the remaining []bool
conversion in spaced writes for bloom filters.

Changes:
- Add GetSpacedHashesFromBitmap() function in bloom_filter.go
  - Computes bloom filter hashes directly from bitmap with validity bits
  - Uses SetBitRunReader for efficient valid value iteration
  - Avoids 8x memory overhead of []bool conversion

- Update writeBitmapValuesSpaced() in column writer
  - Now uses GetSpacedHashesFromBitmap() for bloom filter updates
  - Eliminates final []bool conversion for bloom filters in spaced case
  - Note: Encoding still requires []bool (would need encoder API change)

- Add comprehensive tests (TestGetSpacedHashesFromBitmap)
  - All valid/null/mixed scenarios
  - Unaligned offsets
  - Sparse valid bits
  - Large bitmap (1000 values) with complex patterns
  - Consistency verification with GetSpacedHashes()

Performance Impact:
- Non-spaced writes: Already eliminated 2 []bool conversions (previous commit)
- Spaced writes: Now eliminate 2 of 3 []bool conversions
  - Statistics: using UpdateFromBitmapSpaced() (previous commit)
  - Bloom filter: using GetSpacedHashesFromBitmap() (this commit)
  - Encoding: still needs []bool (requires encoder interface change)

All tests pass (35 new test cases total across both commits).
…rsions in spaced writes

This completes the bitmap optimization by eliminating the final []bool conversion
in spaced writes (writes with validity bitmaps).

**Changes:**
- Added `PutSpacedBitmap()` method to PlainBooleanEncoder that works directly on bitmaps
- Implemented bitmap compression helper using SetBitRunReader and BitmapWriter
- Added `PutSpacedBitmap()` to RleBooleanEncoder (extracts valid bits for RLE buffering)
- Updated column writer to use interface pattern with type assertion
- Added comprehensive tests for bitmap encoding round-trips

**Performance Impact:**
Eliminates 8x memory overhead from []bool conversion in spaced writes:
- Before: bitmap → []bool → encoder
- After: bitmap → bitmap compression → encoder

**Implementation Details:**
`compressBitmapWithValidity()` uses SetBitRunReader to efficiently iterate over
valid runs and BitmapWriter.AppendBitmap to compress the source bitmap by extracting
only valid bits into a contiguous destination bitmap.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…hing

Optimize GetHashesFromBitmap and GetSpacedHashesFromBitmap by reusing a
single-byte slice instead of creating slice headers on each iteration.

Before: var b [1]byte; h.Sum64(b[:])  // b[:] escapes, allocates per iteration
After:  b := []byte{0}; h.Sum64(b)     // reused across iterations

Performance impact (100K booleans):
- Bloom filter: 996ms → 449ms (2.4x faster)
- Full write path: 1,134ms → 581ms (2.1x faster)
- Allocations: 100,001 → 2 allocs/op (eliminated 99,999 allocations)
- Memory: 903KB → 803KB (100KB less)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Complete test coverage for all bitmap-aware methods by adding tests for
RleBooleanEncoder.PutSpacedBitmap().

Tests added:
- TestRleBooleanEncoderPutSpacedBitmap: Basic round-trip encoding/decoding
- TestRleBooleanEncoderPutSpacedBitmapConsistency: Verify bitmap method produces
  identical output to []bool-based PutSpaced method

This brings total test coverage to 100% of new public methods (9/9 tested).

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add benchmarks demonstrating the performance of bitmap-aware statistics
and bloom filter operations.

Benchmarks (100K booleans):
- BooleanStatisticsFromBitmap:     406 ns/op, 2.7 KB/op, 5 allocs/op
- BloomFilterHashingFromBitmap:    451 µs/op, 803 KB/op, 2 allocs/op
- BooleanWritePathBitmap:          587 µs/op, 808 KB/op, 13 allocs/op

These benchmarks can be compared with legacy []bool-based approaches
to verify the 2-3x speedup and 76% memory reduction.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@zeroshade zeroshade requested review from kou and lidavidm March 16, 2026 18:07
// Reuse a single-byte slice to avoid allocating per value
out := make([]uint64, numValues)
b := []byte{0}
for i := int64(0); i < numValues; i++ {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit: use the new range syntax? maybe enforce go fix in CI?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, looks like the go fix is gonna be a big change honestly so I'll hold off on that for the moment and just fix this one spot for this PR

Comment on lines +181 to +183
// Hash each valid bit in this run
for i := int64(0); i < run.Length; i++ {
val := bitutil.BitIsSet(bitmap, int(bitmapOffset+run.Pos+i))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you hoist this out of the loop? The point of a bit run is that the bits are all the same in the run, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right that the point of a bit run is that the bits are all the same in the run, but we're doing runs of the validity bitmap here and then using the values from the value bitmap buffer. The loop is over a run of non-null bits in the bitmap, which means we have to actually check BitIsSet for each one. That said, I can use the range syntax for the loop

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants