treedb: improve live vlog maintenance observability#856
Conversation
There was a problem hiding this comment.
Pull request overview
This PR improves TreeDB live value-log (vlog) maintenance observability by exposing vlog-generation metrics via expvar and adding additional byte-level counters to make rewrite planning and rewrite+GC progress visible.
Changes:
- Export
treedb.cache.vlog_generation.*metrics through the expvar stat-selection filter. - Add rewrite-plan selected-byte counters (total/live/stale) and a cumulative rewrite “reclaimed bytes” counter.
- Add unit tests covering new expvar exposure and rewrite-plan byte accounting paths.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| TreeDB/caching/db.go | Adds new vlog-generation stats counters (plan-selected bytes + reclaimed bytes) and emits them via Stats(). |
| TreeDB/caching/expvar_stats.go | Extends expvar filtering to include treedb.cache.vlog_generation.*. |
| TreeDB/caching/expvar_stats_test.go | Verifies expvar selection/coercion includes a representative vlog_generation metric. |
| TreeDB/caching/vlog_generation_scheduler_test.go | Adds unit tests for rewrite-plan outcome observation and byte counter tracking (including fallback path). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Follow-up slice pushed in What changed
run_celestia validation (fast profile)Command: Run home: Latest diagnostics snapshot sampled: Key counters:
Takeaway: in this early state-sync window, rewrite/vacuum did not trigger, and the dominant behavior was high active-pass contention. We now have explicit counters to drive the next tuning step. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9ce5339a4e
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| db.scheduleDueVlogGenerationDeferredMaintenance() | ||
| db.schedulePendingVlogGenerationCheckpointKick() | ||
| rewriteRan := db.vlogGenerationRewriteRuns.Load() > rewriteRunsBefore | ||
| gcRan := db.vlogGenerationGCRuns.Load() > gcRunsBefore |
There was a problem hiding this comment.
Count pass outcomes before queuing follow-up maintenance
The new pass-outcome counters are computed after scheduleDueVlogGenerationDeferredMaintenance()/schedulePendingVlogGenerationCheckpointKick() run in the defer path, so a just-scheduled retry goroutine can increment vlogGenerationRewriteRuns/vlogGenerationGCRuns first and make this pass look like it performed rewrite/GC when it did not. This appears under contention (fast retry wakeups on another core) and skews the new observability metrics (maintenance.passes.*) by overcounting rewrite/gc and undercounting noop for the current pass.
Useful? React with 👍 / 👎.
|
Pushed another follow-up in Code change
Validation
A/B run_celestia comparisonBoth runs used: Baseline run home: New run home: Key counter deltas:
This looks like a material reduction in scheduler contention, and maintenance made actual rewrite/vacuum progress in the same early-state-sync lab window. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } | ||
| if !rewriteRan && !gcRan { | ||
| db.vlogGenerationMaintenancePassNoop.Add(1) | ||
| } |
There was a problem hiding this comment.
The maintenance pass outcome counters (passes.noop/with_rewrite/with_gc) can be mis-attributed due to the defer order: maintenanceActive is cleared and new maintenance may be scheduled before computing rewriteRan/gcRan from the global counters. If a deferred/checkpoint-kick pass starts quickly after maintenanceActive.Store(false), rewriteRan/gcRan may reflect the next pass instead of the one being released. Consider computing/incrementing the pass outcome counters before clearing vlogGenerationMaintenanceActive and before scheduling any follow-up maintenance (or track rewrite/gc execution via local booleans set during this pass).
|
Pushed Code change
Validation
Third comparison run (same lab command)Run home: Snapshot: Counters:
Trend across runs:
So the latest change primarily removes maintenance churn/noops while preserving rewrite/vacuum progress. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c83e2d6164
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| !db.vlogGenerationCheckpointKickPending.Load() && | ||
| !db.vlogGenerationDeferredMaintenancePending.Load() && | ||
| !db.vlogGenerationDeferredMaintenanceDue(now) { | ||
| return false |
There was a problem hiding this comment.
Surface deferred-due check errors before preflight skip
This new hot-foreground preflight can return early without entering maybeRunVlogGenerationMaintenance, but the !db.vlogGenerationDeferredMaintenanceDue(now) term internally treats rewrite-stage read errors as a plain false result. In sustained non-quiet workloads with no pending flags, backend errors from the deferred-due probe are now silently ignored, so scheduler error state and notifyError are never triggered until the node eventually becomes quiet. That regresses operational visibility compared with the previous path, which always entered maintenance and reported queue/stage load failures immediately.
Useful? React with 👍 / 👎.
|
Instrumentation-first update pushed to What this adds
Test coverage
Next step: run |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 31fbb0a53e
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
Follow-up pushed: WhyIn a longer
This indicated retry loops were still attempting while Change
Validation
A/B
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7cc50d6de3
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
Follow-up update pushed: What was added:
Validation:
Live run (baseline fast profile, no trigger override):
Interpretation:
Offline headroom on same home:
So live maintenance remains significantly below post-rewrite floor, and these counters now make that bottleneck explicit. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Follow-up updates pushed:
Safety fix (AI thread)
Alloc/path optimization
Validation
Recent microbench samples:
Also resolved the two previously open non-outdated review threads (one code-safety, one PR scope note) and updated PR description with the compression-profile scope note. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 44 out of 45 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 96b8d28368
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
Pushed follow-up Change
Validation
Microbench signalCommand:
Observed:
So this is a modest but consistent alloc reduction without changing rewrite semantics. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 44 out of 45 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Pushed follow-up CI failure root cause
Fix
Validation
This should directly address the race-stack seen in CI while preserving functional behavior. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 44 out of 45 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| rawCopy := make([]byte, len(e.raw)) | ||
| copy(rawCopy, e.raw) |
There was a problem hiding this comment.
groupedFrameCacheLookup now allocates and copies the full decoded grouped frame (rawCopy := make(...); copy(...)) on every cache hit. This can be a large per-read allocation and may significantly reduce the benefit of the grouped-frame cache for compressed reads. Since you already added groupedFrameCacheReadTo to copy just the requested value (and handle templates safely), consider refactoring the callers that only need one sub-value to use groupedFrameCacheReadTo (or add a lookup helper that copies only the [valStart:valEnd] range under the lock) to avoid copying the entire frame.
| rawCopy := make([]byte, len(e.raw)) | |
| copy(rawCopy, e.raw) | |
| // Copy only the requested sub-value range so the returned slice is not | |
| // backed by cache-owned storage, but avoid allocating the full frame. | |
| valLen := valEnd - valStart | |
| rawCopy := make([]byte, valLen) | |
| copy(rawCopy, e.raw[int(valStart):int(valEnd)]) |
| raw := make([]byte, 0, int(rawLen)) | ||
| pooledRaw := false | ||
| raw, err := decodeFramePayloadTo(frame, payload[prefixLen:], f.dictLookup, rawLen, raw) | ||
| if err != nil { |
There was a problem hiding this comment.
In readGroupedCompressedFromFileTo, pooledRaw is always set to false and raw is allocated with make(...). This makes the pooledRaw/releaseDecodeScratch branches dead code and defeats the existing decode-scratch reuse mechanism, adding avoidable allocations on the grouped+compressed fallback path. Consider using f.takeDecodeScratch(int(rawLen)) (setting pooledRaw=true) and only allocating when the scratch pool can’t satisfy the request, then ensuring the buffer is released when it isn’t cached.
|
Superseded by a corrected follow-up comment: #856 (comment) |
|
Follow-up after Windows flake on TreeDB vet+test / windows-latest (job 69118858707). Pushed commit 5610d26:
Local validation:
Also on this branch, previous commit 3d83c8b addressed the two new AI comments in TreeDB/internal/valuelog/manager.go (sub-value-only grouped cache copy + safe decode scratch reuse). |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 44 out of 45 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -290,16 +347,20 @@ func (f *File) groupedFrameCacheLookup(start int64, verifyCRC bool, subIndex int | |||
| if e.k <= 0 || e.start != start || e.verifyCRC != verifyCRC || subIndex < 0 || subIndex >= e.k { | |||
| continue | |||
| } | |||
| valStart = e.offsets[subIndex] | |||
| valEnd = e.offsets[subIndex+1] | |||
| rawLen = e.offsets[e.k] | |||
| if valEnd < valStart || valEnd > rawLen || uint32(len(e.raw)) != rawLen { | |||
| valueStart := e.offsets[subIndex] | |||
| valueEnd := e.offsets[subIndex+1] | |||
| frameRawLen := e.offsets[e.k] | |||
| if valueEnd < valueStart || valueEnd > frameRawLen || uint32(len(e.raw)) != frameRawLen { | |||
| continue | |||
| } | |||
| f.groupedFrameCacheClock++ | |||
| e.used = f.groupedFrameCacheClock | |||
| f.groupedFrameCacheHits++ | |||
| return e.raw, valStart, valEnd, rawLen, true | |||
|
|
|||
| valueLen := int(valueEnd - valueStart) | |||
| rawCopy := make([]byte, valueLen) | |||
| copy(rawCopy, e.raw[int(valueStart):int(valueEnd)]) | |||
| return rawCopy, 0, uint32(valueLen), uint32(valueLen), true | |||
| } | |||
| f.groupedFrameCacheMisses++ | |||
| return nil, 0, 0, 0, false | |||
There was a problem hiding this comment.
groupedFrameCacheLookup now allocates a new rawCopy on every cache hit (even though the callers in reader_mmap.go still copy/allocate again before returning). This adds avoidable allocations and an extra copy on the mmap fast path. Consider returning a view into the cached decoded frame (as before) and having callers copy exactly once (or refactor mmap callers to use groupedFrameCacheReadTo-style copy-under-lock) so a cache hit doesn't necessarily allocate twice.
|
Follow-up for the subsequent TreeDB windows-latest flake (job 69119674672). Pushed commit f2ff99d:
Local validation:
|
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 44 out of 45 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Addressed the latest AI review follow-ups in commit 5e846f4. What changed:
Validation run locally:
Current PR state:
|
Summary
treedb.cache.vlog_generation.*metrics through expvar selectionTesting
go test ./TreeDB/caching -count=1go test ./TreeDB/caching -run 'TestSelectTreeDBExpvarStatsFiltersAndCoerces|TestObserveVlogGenerationRewritePlanOutcome_SelectedTracksBytes|TestObserveVlogGenerationRewritePlanOutcome_SelectedTracksSegmentFallbackBytes' -count=1Scope note (compression profiling changes)
This branch also carries earlier compression-profiling changes made during the same investigation cycle (bounded sampling, encoder/scratch reuse, and concurrency-setting adjustments in
TreeDB/internal/compression/profile.go). Those changes are intentionally kept in this stacked branch for continuity with the live-maintenance experiments and benchmark tooling updates.