HBASE-29890 WAL tailing reader should resume partial cell reads instead of resetting compression by sidkhillon · Pull Request #7741 · apache/hbase

sidkhillon · 2026-02-11T20:31:39Z

When the WAL tailing reader hits EOF mid-cell during WAL compression, it currently returns EOF_AND_RESET_COMPRESSION, which forces the reader to re-read the entire WAL file from the beginning to rebuild dictionary state. This is an O(n) operation that becomes increasingly expensive as the WAL grows.

The root cause is that the CompressedKvDecoder eagerly adds entries to the compression dictionaries (ROW, FAMILY, QUALIFIER, and tag dictionaries) as it reads each field of a cell. If an IOException occurs partway through reading a cell, the dictionaries are left in a partially-updated state that no longer matches the actual stream position. The reader has no choice but to throw away the entire compression context and start over.

Proposed Fix is to defer dictionary additions until a cell is fully parsed:

Buffer ROW/FAMILY/QUALIFIER dictionary additions in CompressedKvDecoder and only commit them after parseCellInner() succeeds. On IOException, discard the pending additions.
Add a similar deferred-addition mode to TagCompressionContext for tag dictionaries.
Reset the ValueCompressor if an IOException occurs during the value decompression phase.

With deferred additions, hitting EOF mid-cell leaves the dictionaries in the state they were after the last fully-read cell. This means the reader can return EOF_AND_RESET (a cheap seek to the saved position) instead of EOF_AND_RESET_COMPRESSION, and resume reading from where it left off once the file grows.

sidkhillon · 2026-02-11T20:45:05Z

cc @Apache9, I would really appreciate any feedback you have on this PR. Thank you!

…ing compression

Apache9 · 2026-02-14T06:48:18Z


+  private byte[] getDeferredOrDictEntry(short dictIdx) {
+    if (deferAdditions) {
+      int deferredIdx = dictIdx - deferredBaseIndex;


Is this algorithm always right?

The Dictionary interface does not guarantee that the newly added entry will have the current size as its index, and in the implementation, LRUDictionary may move entries when it is being accessed...

That's a good point. I have created an UndoableLRUDictionary that will let us rollback changes to the dictionary, and I've added many scenarios to test it out as unit tests. Please let me know how that looks.

sidkhillon · 2026-02-23T15:32:12Z

@Apache9 I've addressed the PR feedback you left, and I would really appreciate another review when you have the chance. Thank you!

hgromer · 2026-03-09T15:46:19Z

+      if (snap.savedContents != null) {
+        backingStore.nodeToIndex.remove(node);
+        node.setContents(snap.savedContents, snap.savedOffset, snap.savedLength);
+        backingStore.nodeToIndex.put(node, findIndexForNode(node));


I wonder if doing this in 2 phases avoids needing an n2 approach to iteration here.

Additionally, rollback() does remove/setContents/put on nodeToIndex (a content-based HashMap) one node at a time, in non-deterministic IdentityHashMap iteration order. If during the tracked period a node gets overwritten with a value that equals another node's original value (e.g., evict "a" from slot 0, then "a" gets added to slot 1), there's an intermediate state during rollback where two nodes hold the same content. Depending on iteration order, the HashMap operations can clobber each other, leaving a missing entry in nodeToIndex.

Would it be more efficient and safer to do:

// Phase 1: restore all node contents for (...) { node.prev = snap.savedPrev; node.next = snap.savedNext; if (snap.savedContents != null) { node.setContents(snap.savedContents, snap.savedOffset, snap.savedLength); } } // Phase 2: rebuild nodeToIndex from scratch backingStore.nodeToIndex.clear(); for (short i = 0; i < savedCurrSize; i++) { backingStore.nodeToIndex.put(backingStore.indexToNode[i], i); }

Thanks, I've implemented this suggestion.

hgromer · 2026-03-09T15:51:37Z

-    InvocationTargetException {
-    Constructor<? extends Dictionary> dictConstructor = dictType.getConstructor();
-    tagDict = dictConstructor.newInstance();
+  public TagCompressionContext(Class<? extends Dictionary> dictType, int dictCapacity) {


It seems we're no longer using dictType here. Is this okay? Callers might expect the underlying dictionary type to be dictType. So if it has any custom behavior, you'd be overriding/removing it. Is this okay? What's the intended case for this pluggable dictionary type?

dictType is not really used beyond setting it to the default LRUDictionary in the codebase, so to avoid confusion, I removed the argument from the constructor and set the type explicitly.

Rollback in UndoableLRUDictionary previously restored nodes one at a time, doing remove/setContents/put on the content-based nodeToIndex HashMap. This could clobber entries when two nodes shared the same content during the restore (e.g., an evicted value re-added to a different slot). The fix restores all node state first, then rebuilds nodeToIndex from scratch. Also removes the unused dictType parameter from TagCompressionContext since every caller hardcodes LRUDictionary.class and we always need UndoableLRUDictionary for correctness. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

sidkhillon changed the title ~~[HBASE-29890] WAL tailing reader should resume partial cell reads instead of resetting compression~~ HBASE-29890 WAL tailing reader should resume partial cell reads instead of resetting compression Feb 11, 2026

sidkhillon mentioned this pull request Feb 11, 2026

(Not yet upstream) Reset to last successful cell on EOF with WAL Compression HubSpot/hbase#236

Merged

hgromer requested a review from Apache9 February 12, 2026 12:42

skhillon added 3 commits February 13, 2026 11:33

WAL tailing reader should resume partial cell reads instead of resett…

37c62e6

…ing compression

Spotless

1065922

Use renamed testing utility

874e4ff

sidkhillon force-pushed the reset-last-read-for-upstream branch from 9d57c06 to 874e4ff Compare February 13, 2026 19:34

Apache9 reviewed Feb 14, 2026

View reviewed changes

Resolve PR comment issue regarding LRU eviction for tags

e499a79

sidkhillon requested a review from Apache9 February 23, 2026 15:32

hgromer reviewed Mar 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HBASE-29890 WAL tailing reader should resume partial cell reads instead of resetting compression#7741

HBASE-29890 WAL tailing reader should resume partial cell reads instead of resetting compression#7741
sidkhillon wants to merge 5 commits intoapache:masterfrom
HubSpot:reset-last-read-for-upstream

sidkhillon commented Feb 11, 2026 •

edited

Loading

Uh oh!

sidkhillon commented Feb 11, 2026

Uh oh!

Apache9 Feb 14, 2026

Uh oh!

sidkhillon Feb 17, 2026

Uh oh!

sidkhillon commented Feb 23, 2026

Uh oh!

hgromer Mar 9, 2026

Uh oh!

sidkhillon Mar 10, 2026

Uh oh!

hgromer Mar 9, 2026

Uh oh!

sidkhillon Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

sidkhillon commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sidkhillon commented Feb 11, 2026

Uh oh!

Apache9 Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

sidkhillon Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

sidkhillon commented Feb 23, 2026

Uh oh!

hgromer Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

sidkhillon Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

hgromer Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

sidkhillon Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sidkhillon commented Feb 11, 2026 •

edited

Loading