epoch/finder: use GetEpochInfo for authoritative epoch at boundaries#3227
Draft
epoch/finder: use GetEpochInfo for authoritative epoch at boundaries#3227
Conversation
1bd0a8c to
7884926
Compare
Add tests that reproduce the 2026-03-10 incident where the epoch finder returned epoch 192 for ~51 minutes after epoch 193 started because GetSlot(finalized) was returning a stale slot. Three subtests cover: - Stale GetSlot causing wrong epoch for post-boundary records - Fresh GetSlot returning correct epoch (control case) - Cache amplification: stale result persists for 30min per minute-bucket
…poch Replace the slot-based epoch approximation with GetEpochInfo which returns the authoritative epoch directly from the RPC. This fixes an issue where a stale GetSlot(finalized) response caused the epoch finder to return the wrong epoch for ~51 minutes after an epoch boundary, leading to Account Not Found alerts on all circuits. For recent targets (within the current epoch), the authoritative epoch from GetEpochInfo is returned directly. For targets in prior epochs, slot math is used as before but with the authoritative slot from GetEpochInfo.
7884926 to
e810f8e
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
GetSlot-based epoch approximation withGetEpochInfowhich returns the authoritative epoch directly, fixing epoch misassignment at epoch boundariesAbsoluteSlotfromGetEpochInfoContext
On 2026-03-10, "Account Not Found" alerts fired on both devnet and testnet for all circuits after an epoch rollover. The collector's epoch finder used
GetSlot(finalized)to approximate the epoch, but the RPC returned a stale slot in epoch 192 for ~51 minutes after epoch 193 started. This caused all records to be assigned epoch 192, so no epoch 193 accounts were ever initialized. The monitor (which usesGetEpochInfo) saw epoch 193 and flagged all circuits as missing.See
plans/internet-latency-collector-2026-03-10.mdfor the full incident analysis.Testing Verification
TestEpochFinder_GetEpochInfoFixesStaleSlotBugverifies authoritative epoch is used for recent targets, slot math fallback for prior epochs, and correct behavior at epoch boundariesepoch.SolanaRPCClientbuild cleanly (production callers use*solanarpc.Clientwhich already implementsGetEpochInfo)