perf(parquet): optimize stats and bloom filters for bool columns#715
perf(parquet): optimize stats and bloom filters for bool columns#715zeroshade wants to merge 10 commits intoapache:mainfrom
Conversation
Add bitmap-aware methods for statistics and bloom filters to avoid converting back to []bool during boolean column writes. Changes: - Add GetHashesFromBitmap() to compute bloom filter hashes directly from bitmap without conversion - Add UpdateFromBitmap() and UpdateFromBitmapSpaced() methods to BooleanStatistics for bitmap-aware min/max computation - Update writeBitmapValues() and writeBitmapValuesSpaced() to use new bitmap-aware methods Performance improvement: - Non-spaced writes: Eliminate 2 []bool conversions (statistics + bloom) - Spaced writes: Eliminate 1 []bool conversion (statistics) - Reduces memory allocations and avoids 8x memory overhead of []bool The optimization maintains full backward compatibility and all tests pass.
Add extensive test coverage for the new bitmap-aware statistics and bloom filter methods: Bloom Filter Tests (TestGetHashesFromBitmap): - Empty bitmap handling - Aligned and unaligned offsets - All true/false bits - Large bitmap performance - Consistency verification with GetHash() Statistics Tests (TestBooleanStatisticsUpdateFromBitmap): - All true/false bits scenarios - Mixed true and false values - Unaligned offset handling - Null counting - Multiple updates accumulation - Early exit optimization verification - Consistency with existing Update() method Spaced Statistics Tests (TestBooleanStatisticsUpdateFromBitmapSpaced): - All valid bits - Partial null values - All null values - Unaligned offsets - Only true/false values valid - Multiple updates - Consistency with existing UpdateSpaced() method All tests pass, verifying correctness of the bitmap optimization.
Add GetSpacedHashesFromBitmap() to eliminate the remaining []bool conversion in spaced writes for bloom filters. Changes: - Add GetSpacedHashesFromBitmap() function in bloom_filter.go - Computes bloom filter hashes directly from bitmap with validity bits - Uses SetBitRunReader for efficient valid value iteration - Avoids 8x memory overhead of []bool conversion - Update writeBitmapValuesSpaced() in column writer - Now uses GetSpacedHashesFromBitmap() for bloom filter updates - Eliminates final []bool conversion for bloom filters in spaced case - Note: Encoding still requires []bool (would need encoder API change) - Add comprehensive tests (TestGetSpacedHashesFromBitmap) - All valid/null/mixed scenarios - Unaligned offsets - Sparse valid bits - Large bitmap (1000 values) with complex patterns - Consistency verification with GetSpacedHashes() Performance Impact: - Non-spaced writes: Already eliminated 2 []bool conversions (previous commit) - Spaced writes: Now eliminate 2 of 3 []bool conversions - Statistics: using UpdateFromBitmapSpaced() (previous commit) - Bloom filter: using GetSpacedHashesFromBitmap() (this commit) - Encoding: still needs []bool (requires encoder interface change) All tests pass (35 new test cases total across both commits).
…rsions in spaced writes This completes the bitmap optimization by eliminating the final []bool conversion in spaced writes (writes with validity bitmaps). **Changes:** - Added `PutSpacedBitmap()` method to PlainBooleanEncoder that works directly on bitmaps - Implemented bitmap compression helper using SetBitRunReader and BitmapWriter - Added `PutSpacedBitmap()` to RleBooleanEncoder (extracts valid bits for RLE buffering) - Updated column writer to use interface pattern with type assertion - Added comprehensive tests for bitmap encoding round-trips **Performance Impact:** Eliminates 8x memory overhead from []bool conversion in spaced writes: - Before: bitmap → []bool → encoder - After: bitmap → bitmap compression → encoder **Implementation Details:** `compressBitmapWithValidity()` uses SetBitRunReader to efficiently iterate over valid runs and BitmapWriter.AppendBitmap to compress the source bitmap by extracting only valid bits into a contiguous destination bitmap. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…hing
Optimize GetHashesFromBitmap and GetSpacedHashesFromBitmap by reusing a
single-byte slice instead of creating slice headers on each iteration.
Before: var b [1]byte; h.Sum64(b[:]) // b[:] escapes, allocates per iteration
After: b := []byte{0}; h.Sum64(b) // reused across iterations
Performance impact (100K booleans):
- Bloom filter: 996ms → 449ms (2.4x faster)
- Full write path: 1,134ms → 581ms (2.1x faster)
- Allocations: 100,001 → 2 allocs/op (eliminated 99,999 allocations)
- Memory: 903KB → 803KB (100KB less)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Complete test coverage for all bitmap-aware methods by adding tests for RleBooleanEncoder.PutSpacedBitmap(). Tests added: - TestRleBooleanEncoderPutSpacedBitmap: Basic round-trip encoding/decoding - TestRleBooleanEncoderPutSpacedBitmapConsistency: Verify bitmap method produces identical output to []bool-based PutSpaced method This brings total test coverage to 100% of new public methods (9/9 tested). Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add benchmarks demonstrating the performance of bitmap-aware statistics and bloom filter operations. Benchmarks (100K booleans): - BooleanStatisticsFromBitmap: 406 ns/op, 2.7 KB/op, 5 allocs/op - BloomFilterHashingFromBitmap: 451 µs/op, 803 KB/op, 2 allocs/op - BooleanWritePathBitmap: 587 µs/op, 808 KB/op, 13 allocs/op These benchmarks can be compared with legacy []bool-based approaches to verify the 2-3x speedup and 76% memory reduction. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
parquet/metadata/bloom_filter.go
Outdated
| // Reuse a single-byte slice to avoid allocating per value | ||
| out := make([]uint64, numValues) | ||
| b := []byte{0} | ||
| for i := int64(0); i < numValues; i++ { |
There was a problem hiding this comment.
(nit: use the new range syntax? maybe enforce go fix in CI?)
There was a problem hiding this comment.
good point, looks like the go fix is gonna be a big change honestly so I'll hold off on that for the moment and just fix this one spot for this PR
| // Hash each valid bit in this run | ||
| for i := int64(0); i < run.Length; i++ { | ||
| val := bitutil.BitIsSet(bitmap, int(bitmapOffset+run.Pos+i)) |
There was a problem hiding this comment.
Can't you hoist this out of the loop? The point of a bit run is that the bits are all the same in the run, right?
There was a problem hiding this comment.
You're right that the point of a bit run is that the bits are all the same in the run, but we're doing runs of the validity bitmap here and then using the values from the value bitmap buffer. The loop is over a run of non-null bits in the bitmap, which means we have to actually check BitIsSet for each one. That said, I can use the range syntax for the loop
Rationale for this change
The initial bool optimization (#707) added direct bitmap encoding with
WriteBitmapBatchandWriteBitmapBatchSpacedfunctions to reduce allocations during encoding and decoding. However, this didn't implement the same optimization for statistics updated, bloom filter inserts or spaced encoding.What changes are included in this PR?
PutSpacedBitmapand compress bitmaps with validity buffers.writeBitmapValuesandwriteBitmapValuesSpacedfor the boolean column writer with fallback to[]boolconversions if the encoder doesn't implement the interface.Are these changes tested?
Yes, unit tests are added for everything
Are there any user-facing changes?
No user-facing API changes, this is a pure internal optimization change for boolean columns.
Benchmark Results
Statistics Update
Bloom filter hashing
Full write path (Stats + Bloom Filter)