Skip to content

set GethRetryMaxElapsedTime to 30 min#905

Open
MarvelFisher wants to merge 1 commit intomainfrom
feat/gethRetry_MaxElapsedTime
Open

set GethRetryMaxElapsedTime to 30 min#905
MarvelFisher wants to merge 1 commit intomainfrom
feat/gethRetry_MaxElapsedTime

Conversation

@MarvelFisher
Copy link
Collaborator

@MarvelFisher MarvelFisher commented Mar 9, 2026

Summary by CodeRabbit

  • Chores
    • Improved retry mechanism for Geth connection stability by implementing a uniform maximum retry time limit across all retry paths, ensuring consistent connection behavior during network disruptions.

@MarvelFisher MarvelFisher requested a review from a team as a code owner March 9, 2026 06:46
@MarvelFisher MarvelFisher requested review from twcctop and removed request for a team March 9, 2026 06:46
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 9, 2026

📝 Walkthrough

Walkthrough

Introduces a new exported constant GethRetryMaxElapsedTime set to 30 minutes for Geth connection retries. Refactors NewRetryableClient to create and reuse a single ExponentialBackOff instance across all retry paths instead of creating separate instances, ensuring uniform maximum retry time enforcement.

Changes

Cohort / File(s) Summary
Retry Configuration Refactoring
node/types/retryable_client.go
Added exported constant GethRetryMaxElapsedTime (30 minutes) and refactored NewRetryableClient to reuse a single ExponentialBackOff instance with uniform MaxElapsedTime across all retry paths instead of per-branch backoff creation.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~7 minutes

Poem

🐰 A constant for retry, thirty minutes of grace,
One backoff to rule them all, keeping pace,
No more scattered retries, now unified and strong,
The rabbit hops steadily, rarely goes wrong! 🎪

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and clearly describes the main change: setting GethRetryMaxElapsedTime to 30 minutes, which is the primary objective of the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/gethRetry_MaxElapsedTime

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
node/types/retryable_client.go (1)

160-172: ⚠️ Potential issue | 🔴 Critical

Don't share one ExponentialBackOff across the whole client.

backoff.Retry resets and mutates the backoff state (via Reset() and repeated NextBackOff() calls), making concurrent RPC calls on RetryableClient race when they all use rc.b. This causes nondeterministic retry timing and potential data races. Construct a fresh backoff per call instead, bound to the call context using backoff.WithContext.

All these methods pass rc.b to backoff.Retry:

  • AssembleL2Block (line 313)
  • ValidateL2Block (line 332)
  • NewL2Block (line 353)
  • NewSafeL2Block (line 372)
  • CommitBatch (line 389)
  • AppendBlsSignature (line 396)
  • And others
Suggested fix

Remove the b backoff.BackOff field from RetryableClient (line 141) and the backoff initialization from NewRetryableClient (lines 161–162, 172, 194). Add a helper function:

func newGethBackoff(ctx context.Context) backoff.BackOff {
	bo := backoff.NewExponentialBackOff()
	bo.MaxElapsedTime = GethRetryMaxElapsedTime
	return backoff.WithContext(bo, ctx)
}

Then replace each backoff.Retry(..., rc.b) with backoff.Retry(..., newGethBackoff(ctx)).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@node/types/retryable_client.go` around lines 160 - 172, The RetryableClient
currently stores a single backoff.BackOff (rc.b) created in NewRetryableClient
which is mutated by backoff.Retry and causes races when multiple RPC methods
(e.g., AssembleL2Block, ValidateL2Block, NewL2Block, NewSafeL2Block,
CommitBatch, AppendBlsSignature) call backoff.Retry concurrently; remove the b
backoff.BackOff field from RetryableClient and its initialization in
NewRetryableClient, add a helper newGethBackoff(ctx context.Context) that
creates an ExponentialBackOff with MaxElapsedTime = GethRetryMaxElapsedTime and
wraps it via backoff.WithContext, and update every backoff.Retry call that
currently passes rc.b to instead call backoff.Retry(..., newGethBackoff(ctx)) so
each RPC gets a fresh, context-bound backoff instance.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@node/types/retryable_client.go`:
- Around line 34-36: FetchGethConfigWithRetry still uses a fixed loop based on
GethRetryAttempts and GethRetryInterval (60 * 5s ≈ 5 minutes) so startup is
capped at ~5 minutes; change FetchGethConfigWithRetry to honor
GethRetryMaxElapsedTime instead (or reuse the same backoff policy used
elsewhere) by looping until elapsed time >= GethRetryMaxElapsedTime or using the
existing backoff/RetryWithBackoff helper, replacing the fixed attempt/count
check with a time-based check that references GethRetryMaxElapsedTime (and keep
GethRetryInterval as the sleep between attempts).

---

Outside diff comments:
In `@node/types/retryable_client.go`:
- Around line 160-172: The RetryableClient currently stores a single
backoff.BackOff (rc.b) created in NewRetryableClient which is mutated by
backoff.Retry and causes races when multiple RPC methods (e.g., AssembleL2Block,
ValidateL2Block, NewL2Block, NewSafeL2Block, CommitBatch, AppendBlsSignature)
call backoff.Retry concurrently; remove the b backoff.BackOff field from
RetryableClient and its initialization in NewRetryableClient, add a helper
newGethBackoff(ctx context.Context) that creates an ExponentialBackOff with
MaxElapsedTime = GethRetryMaxElapsedTime and wraps it via backoff.WithContext,
and update every backoff.Retry call that currently passes rc.b to instead call
backoff.Retry(..., newGethBackoff(ctx)) so each RPC gets a fresh, context-bound
backoff instance.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4bb8c8f3-c86b-46ec-b86f-e2861406ddbc

📥 Commits

Reviewing files that changed from the base of the PR and between 895120c and b323a49.

📒 Files selected for processing (1)
  • node/types/retryable_client.go

Comment on lines +34 to +36
GethRetryAttempts = 60 // max retry attempts
GethRetryInterval = 5 * time.Second // interval between retries
GethRetryMaxElapsedTime = 30 * time.Minute
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

This still leaves the startup retry path capped at 5 minutes.

FetchGethConfigWithRetry is still bounded by GethRetryAttempts * GethRetryInterval (60 * 5s), so boot-time geth readiness will continue to fail after ~5 minutes even though this PR introduces a 30-minute retry window. If the goal is to tolerate slower geth bring-up, please derive that loop from GethRetryMaxElapsedTime or move it onto the same backoff policy.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@node/types/retryable_client.go` around lines 34 - 36,
FetchGethConfigWithRetry still uses a fixed loop based on GethRetryAttempts and
GethRetryInterval (60 * 5s ≈ 5 minutes) so startup is capped at ~5 minutes;
change FetchGethConfigWithRetry to honor GethRetryMaxElapsedTime instead (or
reuse the same backoff policy used elsewhere) by looping until elapsed time >=
GethRetryMaxElapsedTime or using the existing backoff/RetryWithBackoff helper,
replacing the fixed attempt/count check with a time-based check that references
GethRetryMaxElapsedTime (and keep GethRetryInterval as the sleep between
attempts).

GethRetryInterval = 5 * time.Second // interval between retries
GethRetryAttempts = 60 // max retry attempts
GethRetryInterval = 5 * time.Second // interval between retries
GethRetryMaxElapsedTime = 30 * time.Minute
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DefaultMaxElapsedTime is 15 * time.Minute. Why do we need to change RetryableClient.b's max elapsed time to 30 minutes via GethRetryMaxElapsedTime?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants