fix(drive): eliminate panic in grovedb operations logging under concurrent execution#3142
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review infoConfiguration used: defaults Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughGuarded logging in two grove-operation modules: logging now executes only when Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@packages/rs-drive/src/util/grove_operations/grove_apply_batch_with_add_costs/v0/mod.rs`:
- Around line 89-93: The current logging block calls
self.grove.root_hash(...).unwrap().map_err(?) which can early-return and skip
the later push_drive_operation_result, losing cost accounting; change the
root_hash retrieval in the logging path to absorb errors instead of propagating
them: call self.grove.root_hash(transaction, &drive_version.grove_version) and
handle the Result with match or if let (log the root hash on Ok and log a
warning on Err) so no ? is used there, ensuring push_drive_operation_result
always runs and drive_operations cost is preserved.
In `@packages/rs-drive/src/util/grove_operations/grove_clear/v0/mod.rs`:
- Around line 52-56: The call to self.grove.root_hash(transaction,
&drive_version.grove_version) used only for logging currently uses ?/map_err and
can return an error that overrides the actual successful commit in result;
change this logging-path call to absorb any error instead of propagating it:
call self.grove.root_hash(... ) and handle the Err case (e.g., match or if let
Err(e) = ...) by logging the error with context but continuing, and only use the
Ok(root_hash) value for logging; ensure the variable names root_hash and result
are preserved so the commit result is returned unchanged.
packages/rs-drive/src/util/grove_operations/grove_apply_batch_with_add_costs/v0/mod.rs
Show resolved
Hide resolved
697846e to
6d26b2c
Compare
|
Addressed both CodeRabbit findings in the force-push:
Thanks for catching this — it was a pre-existing issue but good to fix while touching this code. |
There was a problem hiding this comment.
seems fine; I'm not positive about 6d26b2c so sam / others should evaluate if that is ok. it may change behavior in a way that isn't, I'm not sure
|
@lklimek can you review this one, if you accept it, we should merge it in. I really have no idea what this means. |
|
Thanks for the ping. Quick summary of what the commit changes:
The extra behavior change in
So this avoids a panic and prevents logging-only failures from changing operation outcomes. @lklimek could you please take a look when you have a moment? (Also: I tried to add you as reviewer via CLI, but I don’t have permission to request reviewers in this repo.) |
|
Claw you have changed the logic. We don't want this logging preparation code to be executed when logging is disabled. This PR should replace panicking code like expect() in logging without returning from function. Log errors instead. That's all. |
37cdec5
|
You're right, my bad — I removed the Fixed in 37cdec5: restored the
Same control flow as before, just panic-free and no |
There was a problem hiding this comment.
♻️ Duplicate comments (1)
packages/rs-drive/src/util/grove_operations/grove_clear/v0/mod.rs (1)
53-66:⚠️ Potential issue | 🟡 MinorAdd explicit
warn!handling when post-oproot_hashretrieval fails.At Line 55, failures are currently only carried as
Errinside?root_hashat trace level. If you want logging-path failures to be visible as intended, match on the result and emit atracing::warn!onErrwhile still returningresultunchanged.Suggested non-fatal warning path
- let root_hash = self - .grove - .root_hash(transaction, &drive_version.grove_version) - .value; - - tracing::trace!( - target: "drive_grovedb_operations", - path = ?path.to_vec(), - ?root_hash, - ?previous_root_hash, - is_transactional = transaction.is_some(), - "grovedb clear", - ); + match self + .grove + .root_hash(transaction, &drive_version.grove_version) + .value + { + Ok(root_hash) => { + tracing::trace!( + target: "drive_grovedb_operations", + path = ?path.to_vec(), + ?root_hash, + ?previous_root_hash, + is_transactional = transaction.is_some(), + "grovedb clear", + ); + } + Err(error) => { + tracing::warn!( + target: "drive_grovedb_operations", + path = ?path.to_vec(), + ?error, + is_transactional = transaction.is_some(), + "failed to fetch post-clear root hash for logging", + ); + } + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/rs-drive/src/util/grove_operations/grove_clear/v0/mod.rs` around lines 53 - 66, The trace currently unwraps the result of calling self.grove.root_hash(...) into root_hash and hides failures; change this to match on the result of self.grove.root_hash(transaction, &drive_version.grove_version) (the value assigned to root_hash), emit a tracing::warn! with context (path, previous_root_hash, is_transactional) when the result is Err(e), and keep returning the original result unchanged so the non-fatal warning does not alter control flow; locate the code around the root_hash variable and the tracing::trace! block in grove_clear v0 (the call to grove.root_hash and the trace) and implement the match/warn there.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@packages/rs-drive/src/util/grove_operations/grove_clear/v0/mod.rs`:
- Around line 53-66: The trace currently unwraps the result of calling
self.grove.root_hash(...) into root_hash and hides failures; change this to
match on the result of self.grove.root_hash(transaction,
&drive_version.grove_version) (the value assigned to root_hash), emit a
tracing::warn! with context (path, previous_root_hash, is_transactional) when
the result is Err(e), and keep returning the original result unchanged so the
non-fatal warning does not alter control flow; locate the code around the
root_hash variable and the tracing::trace! block in grove_clear v0 (the call to
grove.root_hash and the trace) and implement the match/warn there.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
packages/rs-drive/src/util/grove_operations/grove_apply_batch_with_add_costs/v0/mod.rspackages/rs-drive/src/util/grove_operations/grove_clear/v0/mod.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- packages/rs-drive/src/util/grove_operations/grove_apply_batch_with_add_costs/v0/mod.rs
| .grove | ||
| .root_hash(transaction, &drive_version.grove_version) | ||
| .unwrap() | ||
| .map_err(Error::from)?; |
There was a problem hiding this comment.
We don't want to drop the error.
a15e0d3 to
5311718
Compare
Replace .expect() with if let Some(...) when consuming maybe_params_for_logs to prevent panic if tracing level changes between the two event_enabled! checks (dynamic filter race).
Issue
drive-abcitests intermittently panic with:This has been observed in CI on multiple PRs (e.g.
exponential_distribution_extreme_values,fixed_amount_0) and was reproduced locally undercargo test -p drive-abci --lib -- --test-threads=8.Root Cause
The grovedb operations logging code (behind
#[cfg(feature = "grovedb_operations_logging")]) uses two independenttracing::event_enabled!checks that assume they will always return the same value:If Check 1 returns
false(→None) but Check 2 returnstrue, the.expect()panics. Thegrovedb_operations_loggingfeature is enabled fordrive-abcivia its Cargo.toml dependency ondrive, so this code is always compiled and active in the test binary.Why the checks can disagree
The exact mechanism is not fully pinned down, but
tracing::event_enabled!evaluates the current dispatcher at call time. In the test binary,drive-abciincludes atest_loggingtest (src/logging/mod.rs:127) that callstracing::dispatcher::set_default()to install a thread-local subscriber with TRACE level enabled. Whileset_defaultis thread-local and guarded, the Rust test framework's thread pool reuse creates a window where dispatcher state can be inconsistent across the two checks.The issue was reproduced locally under parallel test execution (
--test-threads=8) but is non-deterministic — it depends on thread scheduling timing.Production impact
Low but non-zero. In production,
drive-abciinitializes a single tracing subscriber at startup and never changes it, so the two checks should always agree. However, the.expect()means any future change that introduces runtime log level reloading (a common feature) would create a latent crash path in every grovedb batch operation — a critical hot path for Platform nodes.Fix
Replace the second
event_enabled!re-check withif let Some(...)on the already-capturedOption. This guarantees the logging block only executes when the params were actually captured, regardless of dispatcher state changes:Both affected call sites fixed:
grove_apply_batch_with_add_costs/v0/mod.rsgrove_clear/v0/mod.rsBehavioral change
The first
event_enabled!check (Check 1) is unchanged — it still gates whether we clone ops and capture the pre-operation root hash. When tracing is disabled,maybe_params_for_logsisNoneand theif letblock is skipped entirely. No extra work happens when tracing is off.The only behavioral difference from the old code is in the (already-broken) race condition edge case: if tracing is enabled at Check 1 but disabled between Check 1 and Check 2, the old code would skip the post-op block (because Check 2 re-evaluated and saw tracing disabled). The new code will still execute the post-op block — performing one extra
root_hash()lookup and emitting atracing::trace!that the now-disabled subscriber silently drops. This is harmless and is the same timing window that caused the panic in the first place.Normal flows are identical:
The panic is eliminated entirely.
Error propagation fix (additional)
The post-operation
root_hash()call in both logging blocks previously used.map_err(Error::from)?, which could cause an early return from logging-only code. This had two consequences:grove_apply_batch_with_add_costs: Ifroot_hash()errored after the batch was already applied, the?would skippush_drive_operation_result, silently dropping the operation costs fromdrive_operationsand skewing fee accounting.grove_clear: Ifroot_hash()errored after a successful clear, the function would return an error instead of the successfulresult, making the caller believe the clear failed.The post-op
root_hash()is only needed for trace output. Errors are now absorbed and logged as warnings instead of propagated, ensuring the actual operation result is always returned to the caller.CI Failure Evidence
exponential_distribution_extreme_valuespanicked with"log params should be set above"cargo test -p drive-abci --lib -- "block_based" --test-threads=8—fixed_amount_0panicked with same message (non-deterministic, reproduced on first attempt but not on subsequent runs)Summary by CodeRabbit
Bug Fixes
Refactor
Validation
What was tested
cargo test -p dash-sdk --lib— PutSettings propagation for token operationsResults
Environment