Skip to content

refactor(node): introduce BatchVerifier, remove derivation service#904

Closed
curryxbo wants to merge 6 commits intomainfrom
refactor/batch-verifier
Closed

refactor(node): introduce BatchVerifier, remove derivation service#904
curryxbo wants to merge 6 commits intomainfrom
refactor/batch-verifier

Conversation

@curryxbo
Copy link
Contributor

@curryxbo curryxbo commented Mar 5, 2026

Extract batch fetching/parsing/verification logic into a stateless BatchVerifier. Remove the standalone derivation block-production service and its isValidator startup branch. BatchVerifier will be integrated into BlockTagService as the next step.

Summary by CodeRabbit

  • New Features

    • Optional on-demand full batch verification for stronger end-to-end L1→L2 checks.
    • New batch processor that sequentially processes committed batches and updates safe/finalized L2 tags.
    • Config option to set L1 start block for CommitBatch scanning.
  • Refactor

    • Replaced continuous derivation service with a stateless verifier invoked on demand; startup/shutdown simplified.
    • Block scanning now tracks safe and finalized progress independently, with lightweight fallback when verification is disabled.
  • Chores

    • Context-aware retries/cancellation for external node calls and added context-aware block fetch retries.

Extract batch fetching/parsing/verification logic into a stateless
BatchVerifier. Remove the standalone derivation block-production service
and its isValidator startup branch. BatchVerifier will be integrated into
BlockTagService as the next step.
@curryxbo curryxbo requested a review from a team as a code owner March 5, 2026 06:10
@curryxbo curryxbo requested review from Web3Jumb0 and removed request for a team March 5, 2026 06:10
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 5, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Replaces the monolithic Derivation service with a stateless BatchVerifier and new BatchProcessor, rewires node startup and executor initialization, integrates optional full L1→L2 batch verification into BlockTagService, makes geth config retries context-aware, and removes the prior Derivation lifecycle and related artifacts.

Changes

Cohort / File(s) Summary
Batch verification core
node/derivation/batch_verifier.go, node/derivation/batch_info.go, node/derivation/beacon.go, node/derivation/*.go, node/derivation/README.md
Adds stateless BatchVerifier API (types: BatchRoots, BatchBlockContext, BatchInfo), calldata/ABI unpacking, blob sidecar handling, FetchBatchRoots/FetchBatchData/VerifyBatch, and docs; removes three fields from BatchInfo.
Derivation removal & tests
node/derivation/derivation.go, node/derivation/derivation_test.go
Removes the previous Derivation service and lifecycle; tests updated to call unpackCalldataWithABIs(...) instead of instance UnPackData.
Batch processor
node/batchprocessor/processor.go, node/batchprocessor/config.go
Introduces BatchProcessor with polling loop, safe/finalized cursors, L1 event scanning, optional verification via BatchVerifier, and config (PollInterval, SafeConfirmations, L1/Rollup addresses).
BlockTagService integration
node/blocktag/service.go, node/blocktag/config.go
Wires optional *derivation.BatchVerifier into BlockTagService, adds per-tag L1 search trackers, refactors validation to prefer full verification (fallback to state-root), and adds Config.L1StartBlock.
Node startup & executor
node/cmd/node/main.go, node/core/executor.go
Refactors startup: root context and signal handling, Tendermint privval/pubkey derivation, Syncer/Executor construction, mock sequencer wiring, consolidated Start/Stop lifecycle, and makes geth config fetch context-aware.
Retryable RPC client
node/types/retryable_client.go
Makes FetchGethConfigWithRetry context-aware with cancellation; adds RetryableClient.BlockByNumber(ctx, number) with retry semantics and backoff-aware waiting.
Misc / docs / configs
node/derivation/README.md, node/derivation/*, node/blocktag/*, tests
Docs updated to describe new BatchVerifier/BatchProcessor architecture; CLI/config surfaces added for new components; tests adapted for stateless ABI unpacking.

Sequence Diagram(s)

sequenceDiagram
    participant App as BatchProcessor / BlockTagService
    participant BV as BatchVerifier
    participant L1 as L1 Client
    participant Beacon as L1 Beacon
    participant L2 as L2 Client
    participant Val as Validator

    App->>L1: Find CommitBatch tx (FilterLogs / txHash)
    App->>BV: FetchBatchRoots(ctx, txHash / batchIndex)
    BV->>L1: Get transaction & calldata
    alt blobs exist
        BV->>Beacon: Fetch blob sidecars
        Beacon-->>BV: Blob sidecars
    end
    BV->>BV: Parse calldata -> BatchRoots / BlockContexts
    App->>BV: FetchBatchData(ctx, txHash)
    BV->>BV: Decode blobs -> BatchInfo (optional)
    App->>BV: VerifyBatch(ctx, l2Client, roots, batchData)
    
    rect rgba(100,150,255,0.5)
    Note over BV,L2: Step 1: PostStateRoot vs L2 state
    BV->>L2: HeaderByNumber(LastBlockNum)
    L2-->>BV: Header (StateRoot)
    BV->>BV: Compare roots
    end

    rect rgba(100,150,255,0.5)
    Note over BV,L2: Step 2-3: Withdrawal & PrevStateRoot checks
    BV->>L2: BlockByNumber(LastBlockNum)
    L2-->>BV: Block (WithdrawalRoot)
    BV->>BV: Compare withdrawal & prev roots
    end

    rect rgba(100,150,255,0.5)
    Note over BV,L2: Step 4-5: Block contexts & tx content verification
    BV->>L2: HeaderByNumber(per-block) / BlockByNumber(per-block)
    L2-->>BV: Headers / Blocks / Tx bytes
    BV->>BV: Verify timestamps/gas/basefee and tx bytes vs blobs
    end

    alt Mismatch Detected
        BV->>Val: Trigger challenge
        Val-->>BV: Challenge response
    end

    BV-->>App: Verification result
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Suggested labels

validator

Suggested reviewers

  • Web3Jumb0
  • twcctop
  • FletcherMan

Poem

🐰 I hopped through calldata, blobs in tow,

Parsed roots and headers in a gentle row,
BatchVerifier checks each rollup track,
Processor marches batches, never looking back,
A little rabbit, proud of this clean new flow.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 41.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately captures the main refactoring: introduction of BatchVerifier and removal of the Derivation service, which aligns with the substantial changes throughout the codebase.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch refactor/batch-verifier

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@curryxbo curryxbo self-assigned this Mar 5, 2026
@curryxbo curryxbo linked an issue Mar 5, 2026 that may be closed by this pull request
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
node/cmd/node/main.go (1)

89-93: Wrap startup errors with %w instead of %v.

At Line 89 and Line 92, %v loses error-chain semantics and blocks errors.Is/As handling upstream.

♻️ Proposed fix
-		if tmNode, err = sequencer.SetupNode(tmCfg, tmVal, executor, nodeConfig.Logger); err != nil {
-			return fmt.Errorf("failed to setup consensus node, error: %v", err)
+		if tmNode, err = sequencer.SetupNode(tmCfg, tmVal, executor, nodeConfig.Logger); err != nil {
+			return fmt.Errorf("failed to setup consensus node: %w", err)
 		}
-		if err = tmNode.Start(); err != nil {
-			return fmt.Errorf("failed to start consensus node, error: %v", err)
+		if err = tmNode.Start(); err != nil {
+			return fmt.Errorf("failed to start consensus node: %w", err)
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@node/cmd/node/main.go` around lines 89 - 93, The error returns around the
consensus startup lose error-chain semantics because they use fmt.Errorf with
%v; update both fmt.Errorf calls that return errors from the consensus setup and
tmNode.Start() to use %w so callers can use errors.Is/errors.As to inspect
wrapped errors (look for the fmt.Errorf usages that construct "failed to setup
consensus node, error: %v" and "failed to start consensus node, error: %v"
adjacent to tmNode.Start()).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@node/cmd/node/main.go`:
- Around line 75-77: The call to tmVal.GetPubKey() currently ignores its error
and can pass a nil/zero pubKey into node.NewExecutor (via tmVal/GetPubKey and
newSyncerFunc), which later causes silent behavioral changes in
updateSequencerSet; change the code that calls GetPubKey (the tmVal.GetPubKey()
invocation in main.go) to check and handle the returned error explicitly: if
GetPubKey returns an error, log the error (with context) and return/exit early
(or propagate the error) instead of continuing with a nil pubKey so NewExecutor
receives a valid key or the process fails fast; apply the same fix pattern where
GetPubKey is used (e.g., ops/tools/keygen/main.go) to avoid silent zero-valued
tmPubKey use that breaks updateSequencerSet.

In `@node/derivation/batch_verifier.go`:
- Around line 199-223: The fallback read is using the incoming parameter
batchIndex instead of the decoded index derived from calldata (parentBatchIndex
-> roots.BatchIndex), which can read the wrong BatchDataStore entry; update the
BatchDataStore call to use the decoded index (roots.BatchIndex or
parentBatchIndex+1) converted to *big.Int (same conversion already used) so the
fallback uses the same batch index as set on BatchRoots; ensure the call remains
bv.rollup.BatchDataStore(&bind.CallOpts{Context: ctx},
new(big.Int).SetUint64(roots.BatchIndex)).
- Around line 342-366: The loop currently treats any single match as sufficient
(only checking matchedCount == 0) so partial coverage of blobHashes can slip
through; change the matching logic to ensure every expected hash in blobHashes
is matched exactly once: build a map keyed by expectedHash (from blobHashes) and
mark entries when you find a matching versionedHash (inside the loop over
blobSidecars using KZGToVersionedHash), append to blobTxSidecar as before, and
after scanning all sidecars verify that every expectedHash map entry was matched
(or that matchedCount == len(blobHashes)); if any expectedHash is not matched,
return the error currently returned for no matches (use l1TxHash.Hex() for
context). Ensure you still copy/convert the blob, commitment and proof into
blobTxSidecar (Blob, KZGBlob, commitment, kzg4844.Proof) only when a match is
recorded.

---

Nitpick comments:
In `@node/cmd/node/main.go`:
- Around line 89-93: The error returns around the consensus startup lose
error-chain semantics because they use fmt.Errorf with %v; update both
fmt.Errorf calls that return errors from the consensus setup and tmNode.Start()
to use %w so callers can use errors.Is/errors.As to inspect wrapped errors (look
for the fmt.Errorf usages that construct "failed to setup consensus node, error:
%v" and "failed to start consensus node, error: %v" adjacent to tmNode.Start()).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9d7ac09e-edf3-48b2-94ab-d61270652ac7

📥 Commits

Reviewing files that changed from the base of the PR and between 895120c and 4a133be.

📒 Files selected for processing (5)
  • node/cmd/node/main.go
  • node/derivation/batch_verifier.go
  • node/derivation/derivation.go
  • node/derivation/derivation_test.go
  • node/types/retryable_client.go
💤 Files with no reviewable changes (1)
  • node/derivation/derivation.go

Comment on lines +199 to +223
// Derive batchIndex from parentBatchHeader embedded in calldata
parentBatchHeader := types.BatchHeaderBytes(batch.ParentBatchHeader)
parentBatchIndex, err := parentBatchHeader.BatchIndex()
if err != nil {
return nil, fmt.Errorf("decode parent batch index: %w", err)
}

roots := &BatchRoots{
BatchIndex: parentBatchIndex + 1,
LastBlockNum: batch.LastBlockNumber,
PrevStateRoot: batch.PrevStateRoot,
PostStateRoot: batch.PostStateRoot,
WithdrawalRoot: batch.WithdrawRoot,
NumL1Messages: batch.NumL1Messages,
}

// v0/legacy batches do not encode LastBlockNumber in calldata (it's in the blob).
// Fall back to the on-chain BatchDataStore.
if roots.LastBlockNum == 0 {
batchData, err := bv.rollup.BatchDataStore(&bind.CallOpts{Context: ctx}, new(big.Int).SetUint64(batchIndex))
if err != nil {
return nil, fmt.Errorf("query BatchDataStore for batchIndex %d: %w", batchIndex, err)
}
roots.LastBlockNum = batchData.BlockNumber.Uint64()
}
Copy link
Contributor

@coderabbitai coderabbitai bot Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate and consistently use the decoded batch index in fallback reads.

At Line 207 you derive roots.BatchIndex from calldata, but Line 218 queries BatchDataStore using the method argument batchIndex. If these diverge, fallback can read the wrong batch metadata and lead to false verification results.

🛠️ Proposed fix
-	roots := &BatchRoots{
-		BatchIndex:     parentBatchIndex + 1,
+	decodedBatchIndex := parentBatchIndex + 1
+	if batchIndex != decodedBatchIndex {
+		return nil, fmt.Errorf("batch index mismatch: arg=%d decoded=%d", batchIndex, decodedBatchIndex)
+	}
+
+	roots := &BatchRoots{
+		BatchIndex:     decodedBatchIndex,
 		LastBlockNum:   batch.LastBlockNumber,
 		PrevStateRoot:  batch.PrevStateRoot,
 		PostStateRoot:  batch.PostStateRoot,
 		WithdrawalRoot: batch.WithdrawRoot,
 		NumL1Messages:  batch.NumL1Messages,
 	}
@@
-		batchData, err := bv.rollup.BatchDataStore(&bind.CallOpts{Context: ctx}, new(big.Int).SetUint64(batchIndex))
+		batchData, err := bv.rollup.BatchDataStore(&bind.CallOpts{Context: ctx}, new(big.Int).SetUint64(decodedBatchIndex))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Derive batchIndex from parentBatchHeader embedded in calldata
parentBatchHeader := types.BatchHeaderBytes(batch.ParentBatchHeader)
parentBatchIndex, err := parentBatchHeader.BatchIndex()
if err != nil {
return nil, fmt.Errorf("decode parent batch index: %w", err)
}
roots := &BatchRoots{
BatchIndex: parentBatchIndex + 1,
LastBlockNum: batch.LastBlockNumber,
PrevStateRoot: batch.PrevStateRoot,
PostStateRoot: batch.PostStateRoot,
WithdrawalRoot: batch.WithdrawRoot,
NumL1Messages: batch.NumL1Messages,
}
// v0/legacy batches do not encode LastBlockNumber in calldata (it's in the blob).
// Fall back to the on-chain BatchDataStore.
if roots.LastBlockNum == 0 {
batchData, err := bv.rollup.BatchDataStore(&bind.CallOpts{Context: ctx}, new(big.Int).SetUint64(batchIndex))
if err != nil {
return nil, fmt.Errorf("query BatchDataStore for batchIndex %d: %w", batchIndex, err)
}
roots.LastBlockNum = batchData.BlockNumber.Uint64()
}
// Derive batchIndex from parentBatchHeader embedded in calldata
parentBatchHeader := types.BatchHeaderBytes(batch.ParentBatchHeader)
parentBatchIndex, err := parentBatchHeader.BatchIndex()
if err != nil {
return nil, fmt.Errorf("decode parent batch index: %w", err)
}
decodedBatchIndex := parentBatchIndex + 1
if batchIndex != decodedBatchIndex {
return nil, fmt.Errorf("batch index mismatch: arg=%d decoded=%d", batchIndex, decodedBatchIndex)
}
roots := &BatchRoots{
BatchIndex: decodedBatchIndex,
LastBlockNum: batch.LastBlockNumber,
PrevStateRoot: batch.PrevStateRoot,
PostStateRoot: batch.PostStateRoot,
WithdrawalRoot: batch.WithdrawRoot,
NumL1Messages: batch.NumL1Messages,
}
// v0/legacy batches do not encode LastBlockNumber in calldata (it's in the blob).
// Fall back to the on-chain BatchDataStore.
if roots.LastBlockNum == 0 {
batchData, err := bv.rollup.BatchDataStore(&bind.CallOpts{Context: ctx}, new(big.Int).SetUint64(decodedBatchIndex))
if err != nil {
return nil, fmt.Errorf("query BatchDataStore for batchIndex %d: %w", batchIndex, err)
}
roots.LastBlockNum = batchData.BlockNumber.Uint64()
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@node/derivation/batch_verifier.go` around lines 199 - 223, The fallback read
is using the incoming parameter batchIndex instead of the decoded index derived
from calldata (parentBatchIndex -> roots.BatchIndex), which can read the wrong
BatchDataStore entry; update the BatchDataStore call to use the decoded index
(roots.BatchIndex or parentBatchIndex+1) converted to *big.Int (same conversion
already used) so the fallback uses the same batch index as set on BatchRoots;
ensure the call remains bv.rollup.BatchDataStore(&bind.CallOpts{Context: ctx},
new(big.Int).SetUint64(roots.BatchIndex)).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

review agent

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find all callers of FetchBatchRoots
rg -n "FetchBatchRoots" --type go -C 5

# Also look at the full FetchBatchRoots function in batch_verifier.go
rg -n "FetchBatchRoots|batchIndex" node/derivation/batch_verifier.go -A 3 -B 3

# Check how BlockTagService calls FetchBatchRoots
rg -n "FetchBatchRoots\|BatchIndex\|batchIndex" node/blocktag/service.go -C 5

Length of output: 9460


@curryxbo, here's the deeper analysis:

Confirmed — the original concern is valid.

From the call site in node/blocktag/service.go line 369:

roots, err := s.batchVerifier.FetchBatchRoots(s.ctx, txHash, batchIndex)

The batchIndex parameter comes from L1 event log scanning (the CommitBatch(uint256, bytes32) event). Inside FetchBatchRoots, roots.BatchIndex is independently derived from the calldata's embedded ParentBatchHeader.

In a well-formed chain these two sources agree. However, the inconsistency is architectural: after line 230 sets roots.BatchIndex from calldata, the rest of the function uses roots.BatchIndex as the authoritative decoded value (logging, ChallengeState, etc.) — except the BatchDataStore fallback at line 241, which still reaches back to the raw method parameter.

The fix is simply to use roots.BatchIndex in the fallback so the function is internally consistent with its own decoded state:

if roots.LastBlockNum == 0 {
    batchData, err := bv.rollup.BatchDataStore(
        &bind.CallOpts{Context: ctx},
        new(big.Int).SetUint64(roots.BatchIndex), // use decoded index, not method arg
    )
    if err != nil {
        return nil, fmt.Errorf("query BatchDataStore for batchIndex %d: %w", roots.BatchIndex, err)
    }
    roots.LastBlockNum = batchData.BlockNumber.Uint64()
}

This is a minimal, low-risk change that makes the fallback consistent with the rest of the function. No need for the divergence check I initially proposed — but the use of roots.BatchIndex in the fallback is the right fix regardless.

Comment on lines +342 to +366
for _, sidecar := range blobSidecars {
var commitment kzg4844.Commitment
copy(commitment[:], sidecar.KZGCommitment[:])
versionedHash := KZGToVersionedHash(commitment)

for _, expectedHash := range blobHashes {
if !bytes.Equal(versionedHash[:], expectedHash[:]) {
continue
}
matchedCount++
b, err := hexutil.Decode(sidecar.Blob)
if err != nil {
return nil, fmt.Errorf("decode blob hex for tx %s: %w", l1TxHash.Hex(), err)
}
var blob Blob
copy(blob[:], b)
blobTxSidecar.Blobs = append(blobTxSidecar.Blobs, *blob.KZGBlob())
blobTxSidecar.Commitments = append(blobTxSidecar.Commitments, commitment)
blobTxSidecar.Proofs = append(blobTxSidecar.Proofs, kzg4844.Proof(sidecar.KZGProof))
break
}
}
if matchedCount == 0 {
return nil, fmt.Errorf("FetchBatchData: no matching blob found for tx %s", l1TxHash.Hex())
}
Copy link
Contributor

@coderabbitai coderabbitai bot Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Require full blob-hash coverage, not just “at least one match.”

At Line 364, only matchedCount == 0 is treated as failure. Partial matches currently pass, which can leave batch.Sidecar incomplete and cause incorrect batch parsing/verification.

🛠️ Proposed fix
-		if matchedCount == 0 {
-			return nil, fmt.Errorf("FetchBatchData: no matching blob found for tx %s", l1TxHash.Hex())
+		if matchedCount != len(blobHashes) {
+			return nil, fmt.Errorf(
+				"FetchBatchData: incomplete blob match for tx %s: matched=%d expected=%d",
+				l1TxHash.Hex(), matchedCount, len(blobHashes),
+			)
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@node/derivation/batch_verifier.go` around lines 342 - 366, The loop currently
treats any single match as sufficient (only checking matchedCount == 0) so
partial coverage of blobHashes can slip through; change the matching logic to
ensure every expected hash in blobHashes is matched exactly once: build a map
keyed by expectedHash (from blobHashes) and mark entries when you find a
matching versionedHash (inside the loop over blobSidecars using
KZGToVersionedHash), append to blobTxSidecar as before, and after scanning all
sidecars verify that every expectedHash map entry was matched (or that
matchedCount == len(blobHashes)); if any expectedHash is not matched, return the
error currently returned for no matches (use l1TxHash.Hex() for context). Ensure
you still copy/convert the blob, commitment and proof into blobTxSidecar (Blob,
KZGBlob, commitment, kzg4844.Proof) only when a match is recorded.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

review agent

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Get the blob matching section of batch_verifier.go
rg -n "matchedCount|blobHashes|blobSidecars|blobTxSidecar" node/derivation/batch_verifier.go -A2 -B2

Length of output: 2194


@curryxbo, the issue is confirmed — Line 387 only guards against zero matches, so partial sidecar coverage (e.g., 2 out of 3 blobs matched) silently passes and leaves batch.Sidecar incomplete.

The minimal fix is the one originally proposed (matchedCount != len(blobHashes)). However, there is a subtle secondary risk: if two different sidecars happen to compute the same versionedHash and both match the same entry in blobHashes, matchedCount would reach len(blobHashes) without all distinct hashes being covered. A map-based approach eliminates this entirely:

// Match each blob hash to its sidecar and assemble the BlobTxSidecar.
var blobTxSidecar eth.BlobTxSidecar
// Track which expected hashes have been matched to prevent double-counting.
matched := make(map[common.Hash]bool, len(blobHashes))
for _, sidecar := range blobSidecars {
    var commitment kzg4844.Commitment
    copy(commitment[:], sidecar.KZGCommitment[:])
    versionedHash := KZGToVersionedHash(commitment)

    for _, expectedHash := range blobHashes {
        if !bytes.Equal(versionedHash[:], expectedHash[:]) {
            continue
        }
        if matched[expectedHash] {
            // Already matched this hash via another sidecar; skip duplicate.
            break
        }
        matched[expectedHash] = true
        b, err := hexutil.Decode(sidecar.Blob)
        if err != nil {
            return nil, fmt.Errorf("decode blob hex for tx %s: %w", l1TxHash.Hex(), err)
        }
        var blob Blob
        copy(blob[:], b)
        blobTxSidecar.Blobs = append(blobTxSidecar.Blobs, *blob.KZGBlob())
        blobTxSidecar.Commitments = append(blobTxSidecar.Commitments, commitment)
        blobTxSidecar.Proofs = append(blobTxSidecar.Proofs, kzg4844.Proof(sidecar.KZGProof))
        break
    }
}
if len(matched) != len(blobHashes) {
    return nil, fmt.Errorf(
        "FetchBatchData: incomplete blob match for tx %s: matched=%d expected=%d",
        l1TxHash.Hex(), len(matched), len(blobHashes),
    )
}
bv.logger.Info("FetchBatchData: blobs matched", "matched", len(matched), "expected", len(blobHashes))
batch.Sidecar = blobTxSidecar

Key changes:

  1. matched map tracks which expectedHash values have been covered, preventing double-counting.
  2. Guard on duplicate sidecar matches (if matched[expectedHash]) skips a sidecar that would re-match an already-covered hash.
  3. len(matched) != len(blobHashes) enforces full coverage.

corey added 2 commits March 8, 2026 22:30
- Wire BatchVerifier into BlockTagService; replace lightweight
  CommittedStateRoots-only check with 5-step VerifyBatch
  (PostStateRoot, WithdrawalRoot, PrevStateRoot, BlockContexts, txs)
- Add fetchCommitBatchTxHash to locate CommitBatch L1 tx via FilterLogs
- Use separate safeSearchTracker / finalizedSearchTracker to avoid
  safe-tag progress overwriting finalized-tag search start
- Add l1SearchTracker: fixed mode (L1StartBlock > 0) or auto mode
  (refined from last finalized batch at startup, then progressive)
- Read DerivationBaseHeight flag in main.go bvCfg to handle
  snapshot-synced nodes correctly
- Pass context.Context to FetchGethConfigWithRetry for cancellation
- Add derivation/README.md documenting BatchVerifier and integration
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
node/cmd/node/main.go (1)

136-149: Consider stopping the Executor's Syncer during shutdown.

The shutdown logic stops mock.Sequencer, tmNode, and blockTagSvc, but the Executor may hold a Syncer instance (created via newSyncerFunc in devSequencer mode per node/core/executor.go:163-170). While the Syncer's Start() is currently commented out, if it's ever enabled, the Syncer.Stop() method (see node/sync/syncer.go:106-118) should be called to properly release its resources.

♻️ Suggested improvement

Consider adding a Stop() or Close() method to Executor that manages the Syncer lifecycle, and call it in the shutdown path:

// In shutdown section:
if executor != nil {
    executor.Stop() // Would internally call syncer.Stop() if syncer was started
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@node/cmd/node/main.go` around lines 136 - 149, Shutdown currently omits
stopping the Executor's Syncer, so add a lifecycle method (e.g., Executor.Stop
or Executor.Close) on the Executor that checks for a non-nil Syncer (created via
newSyncerFunc in devSequencer) and calls Syncer.Stop() when appropriate; then
update the shutdown sequence to call executor.Stop() alongside ms.Stop(),
tmNode.Stop(), and blockTagSvc.Stop() so the Syncer resources are properly
released if Start() is ever enabled.
node/derivation/README.md (1)

22-32: Add language specifier to ASCII diagram code block.

The fenced code block for the architecture diagram should have a language specifier to satisfy markdown linting. Since this is an ASCII diagram, use text or plaintext.

📝 Proposed fix
-```
+```text
 BlockTagService (scheduler)
 │  polls L1 every N seconds
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@node/derivation/README.md` around lines 22 - 32, The fenced ASCII diagram
block containing "BlockTagService (scheduler)" needs a language specifier to
satisfy markdown linting; update the opening fence from ``` to ```text (or
```plaintext) so the diagram is treated as plain text—no code changes beyond
replacing the opening triple backticks for the diagram block in the README.
node/blocktag/service.go (1)

9-19: Consider grouping imports consistently.

The github.com/morph-l2/go-ethereum import (line 12) should be grouped with other go-ethereum imports (lines 9-11) for consistency.

📝 Proposed fix
 import (
 	"context"
 	"fmt"
 	"math/big"
 	"time"

 	"github.com/morph-l2/go-ethereum/accounts/abi/bind"
 	"github.com/morph-l2/go-ethereum/common"
 	"github.com/morph-l2/go-ethereum/ethclient"
-	"github.com/morph-l2/go-ethereum"
+	ethereum "github.com/morph-l2/go-ethereum"
 	"github.com/morph-l2/go-ethereum/rpc"
 	tmlog "github.com/tendermint/tendermint/libs/log"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@node/blocktag/service.go` around lines 9 - 19, The imports in
node/blocktag/service.go are inconsistently grouped: the standalone import
"github.com/morph-l2/go-ethereum" should be moved into the go-ethereum group
with the other related imports (e.g.,
"github.com/morph-l2/go-ethereum/accounts/abi/bind",
"github.com/morph-l2/go-ethereum/common",
"github.com/morph-l2/go-ethereum/ethclient",
"github.com/morph-l2/go-ethereum/rpc") inside the same import block; update the
import section so all go-ethereum packages are grouped together for consistency
and readability.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@node/derivation/batch_verifier.go`:
- Around line 193-197: The Close method on BatchVerifier currently only closes
l1Client and l2EthClient but must also release the HTTP connection pool held by
l1BeaconClient; add a Close() method to the HTTP-based beacon client (e.g.,
BasicHTTPClient or the L1BeaconClient type) that calls the underlying
*http.Client.CloseIdleConnections (and any other cleanup), update the beacon
client interface if necessary to include Close(), and modify
BatchVerifier.Close() to check if bv.l1BeaconClient != nil and call its Close()
before returning.

---

Nitpick comments:
In `@node/blocktag/service.go`:
- Around line 9-19: The imports in node/blocktag/service.go are inconsistently
grouped: the standalone import "github.com/morph-l2/go-ethereum" should be moved
into the go-ethereum group with the other related imports (e.g.,
"github.com/morph-l2/go-ethereum/accounts/abi/bind",
"github.com/morph-l2/go-ethereum/common",
"github.com/morph-l2/go-ethereum/ethclient",
"github.com/morph-l2/go-ethereum/rpc") inside the same import block; update the
import section so all go-ethereum packages are grouped together for consistency
and readability.

In `@node/cmd/node/main.go`:
- Around line 136-149: Shutdown currently omits stopping the Executor's Syncer,
so add a lifecycle method (e.g., Executor.Stop or Executor.Close) on the
Executor that checks for a non-nil Syncer (created via newSyncerFunc in
devSequencer) and calls Syncer.Stop() when appropriate; then update the shutdown
sequence to call executor.Stop() alongside ms.Stop(), tmNode.Stop(), and
blockTagSvc.Stop() so the Syncer resources are properly released if Start() is
ever enabled.

In `@node/derivation/README.md`:
- Around line 22-32: The fenced ASCII diagram block containing "BlockTagService
(scheduler)" needs a language specifier to satisfy markdown linting; update the
opening fence from ``` to ```text (or ```plaintext) so the diagram is treated as
plain text—no code changes beyond replacing the opening triple backticks for the
diagram block in the README.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 818eeef9-e689-4d4e-b392-2b8908db36f3

📥 Commits

Reviewing files that changed from the base of the PR and between 4a133be and f179cee.

📒 Files selected for processing (7)
  • node/blocktag/config.go
  • node/blocktag/service.go
  • node/cmd/node/main.go
  • node/core/executor.go
  • node/derivation/README.md
  • node/derivation/batch_verifier.go
  • node/types/retryable_client.go

Comment on lines +193 to +197
// Close releases the L1 and L2 RPC connections held by the BatchVerifier.
func (bv *BatchVerifier) Close() {
bv.l1Client.Close()
bv.l2EthClient.Close()
}
Copy link
Contributor

@coderabbitai coderabbitai bot Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if L1BeaconClient or BasicHTTPClient has a Close method
rg -nP --type=go 'func.*\(.*L1BeaconClient\).*Close|func.*\(.*BasicHTTPClient\).*Close' node/derivation/

Repository: morph-l2/morph

Length of output: 40


🏁 Script executed:

# Check NewBatchVerifier to see if l1BeaconClient is initialized and if it's optional
sed -n '100,200p' node/derivation/batch_verifier.go | head -100

Repository: morph-l2/morph

Length of output: 3440


🏁 Script executed:

# Check the L1BeaconClient and BasicHTTPClient definitions
cat -n node/derivation/base_client.go

Repository: morph-l2/morph

Length of output: 1557


🏁 Script executed:

# Search for L1BeaconClient type definition in batch_verifier.go
rg -n "l1BeaconClient" node/derivation/batch_verifier.go | head -20

Repository: morph-l2/morph

Length of output: 574


Close() should also clean up the L1 beacon HTTP client when configured.

The Close() method releases l1Client and l2EthClient but omits l1BeaconClient. When a beacon RPC endpoint is configured, l1BeaconClient is initialized with a BasicHTTPClient that holds an *http.Client (line 35 in base_client.go) with open connection pools. These connections should be explicitly released to follow proper resource cleanup practices.

Add a Close() method to BasicHTTPClient (or L1BeaconClient) and invoke it in BatchVerifier.Close() when l1BeaconClient is not nil.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@node/derivation/batch_verifier.go` around lines 193 - 197, The Close method
on BatchVerifier currently only closes l1Client and l2EthClient but must also
release the HTTP connection pool held by l1BeaconClient; add a Close() method to
the HTTP-based beacon client (e.g., BasicHTTPClient or the L1BeaconClient type)
that calls the underlying *http.Client.CloseIdleConnections (and any other
cleanup), update the beacon client interface if necessary to include Close(),
and modify BatchVerifier.Close() to check if bv.l1BeaconClient != nil and call
its Close() before returning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

review agent

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check base_client.go fully
cat -n node/derivation/base_client.go

echo "---"

# Check beacon.go for L1BeaconClient struct and any existing Close methods
cat -n node/derivation/beacon.go 2>/dev/null || echo "beacon.go not found"

echo "---"

# Search for all Close methods in the derivation package
rg -n "func.*Close" node/derivation/

echo "---"

# Find where L1BeaconClient is defined
rg -n "type L1BeaconClient" node/derivation/

echo "---"

# Find all files in derivation package
fd -e go . node/derivation/

Length of output: 10802


@curryxbo — The review comment is valid. Here's the full picture and a concrete fix:

Root cause:

  • BasicHTTPClient (base_client.go) holds an *http.Client but has no Close() method.
  • L1BeaconClient (beacon.go) holds the HTTP interface (which only defines Get()), so there's no cleanup path.
  • BatchVerifier.Close() never calls any cleanup on l1BeaconClient, leaving the HTTP connection pool open.

Proposed fix — three small changes:

1. node/derivation/base_client.go — add Close() to BasicHTTPClient:

// Close releases idle connections held by the underlying HTTP client.
func (cl *BasicHTTPClient) Close() {
	cl.client.CloseIdleConnections()
}

2. node/derivation/beacon.go — add Close() to L1BeaconClient (using a type assertion to avoid changing the HTTP interface):

// Close releases any resources held by the underlying HTTP client, if it supports cleanup.
func (cl *L1BeaconClient) Close() {
	if c, ok := cl.cl.(interface{ Close() }); ok {
		c.Close()
	}
}

3. node/derivation/batch_verifier.go — update BatchVerifier.Close():

// Close releases the L1 and L2 RPC connections held by the BatchVerifier.
func (bv *BatchVerifier) Close() {
	bv.l1Client.Close()
	bv.l2EthClient.Close()
	if bv.l1BeaconClient != nil {
		bv.l1BeaconClient.Close()
	}
}

The type-assertion approach in step 2 keeps the HTTP interface minimal (no forced Close() on all implementors, which would break beacon_test.go mocks) while still enabling proper cleanup for the concrete *BasicHTTPClient used in production.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@node/cmd/node/main.go`:
- Around line 115-122: derivation.NewBatchVerifier is being called with
context.Background(), which lets its internal retry loop block startup forever;
replace that with a shared, cancellable/root signal-aware context (the same
context you use for shutdown) and pass that context into
derivation.NewBatchVerifier so its retries can be cancelled; thread the same
context into blocktag.NewBlockTagService (and any related constructors) so
BatchVerifier creation can be interrupted and the fallback path (bv == nil)
executes and graceful shutdown works; refer to derivation.NewBatchVerifier,
blocktag.NewBlockTagService, BatchVerifier/bv, and nodeConfig.Logger when
locating the calls to update.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9a74a86a-cf4d-4b7c-b1e8-e5af93e54b57

📥 Commits

Reviewing files that changed from the base of the PR and between f179cee and 00dedae.

📒 Files selected for processing (1)
  • node/cmd/node/main.go

corey added 2 commits March 9, 2026 10:20
Replace context.Background() passed to NewBatchVerifier with a
signal.NotifyContext so that startup retries can be interrupted by OS
signals, preventing an indefinite hang when the L2 endpoint is down.
Also removes the now-redundant manual interrupt channel.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@node/blocktag/service.go`:
- Around line 391-405: The tracker only advances via tracker.Advance, causing
fetchCommitBatchTxHash to miss events if batchIndex is rewound by
findCompletedBatchForL2BlockWithDepth; modify the logic that updates the tracker
so when the cached batch index moves backwards you call the tracker's reset
method (e.g., tracker.Reset or set its FromBlock to zero) or detect the rewind
and bypass the optimized FromBlock by passing a nil/zero FromBlock to
s.l1Client.FilterLogs; update both the CommitBatch lookup here (around
fetchCommitBatchTxHash usage where tracker.Advance is called) and the analogous
spot mentioned (lines ~618-622) to ensure the tracker is reset/bypassed on
batch-index rewind.

In `@node/derivation/batch_verifier.go`:
- Around line 108-118: NewBatchVerifier currently uses ethclient.Dial which
ignores the caller context; change both L1 and L2 dials to context-aware calls
(use ethclient.DialContext with the provided ctx or equivalent context-aware RPC
dial) so the function respects cancellation/timeouts, and keep the existing
cleanup logic (ensure l1Client.Close() is still called if the L2 dial fails).
Update the L1 dial for l1Client and the L2 dial for l2EthClient to use ctx and
propagate the wrapped error messages as before.

In `@node/types/retryable_client.go`:
- Around line 67-89: FetchGethConfigWithRetry currently calls
FetchGethConfig(rpcURL, logger) without a context, so long-running
rpc.Dial()/client.Call() calls can't be canceled; change FetchGethConfig
signature to accept ctx context.Context, update its implementation to use
rpc.DialContext(ctx, ...) and client.CallContext(ctx, ...) (or the context-aware
equivalents), and then pass the ctx from FetchGethConfigWithRetry into
FetchGethConfig(ctx, rpcURL, logger) so cancellation/timeout in the retry loop
actually aborts the underlying RPC operations.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 24adc770-ccd8-4fce-9073-53f97b320e53

📥 Commits

Reviewing files that changed from the base of the PR and between 00dedae and faaa516.

📒 Files selected for processing (4)
  • node/blocktag/service.go
  • node/derivation/batch_info.go
  • node/derivation/batch_verifier.go
  • node/types/retryable_client.go
💤 Files with no reviewable changes (1)
  • node/derivation/batch_info.go

Comment on lines +391 to +405
logs, err := s.l1Client.FilterLogs(s.ctx, ethereum.FilterQuery{
FromBlock: new(big.Int).SetUint64(tracker.FromBlock()),
Addresses: []common.Address{s.rollupAddress},
Topics: [][]common.Hash{
{derivation.RollupEventTopicHash},
{batchIndexHash},
},
})
if err != nil {
return common.Hash{}, fmt.Errorf("filter CommitBatch logs for batch %d: %w", batchIndex, err)
}
if len(logs) == 0 {
return common.Hash{}, fmt.Errorf("no CommitBatch event found for batch %d", batchIndex)
}
tracker.Advance(logs[0].BlockNumber)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reset the log tracker when batch selection moves backward.

These trackers only advance forward. That breaks the rollback path in this same file: when findCompletedBatchForL2BlockWithDepth resets the cached batch index after an L2 reorg, an earlier batchIndex can become valid again, but FromBlock() is now past its CommitBatch event, so fetchCommitBatchTxHash will never find it. Please reset the corresponding tracker on batch-index rewind, or bypass the optimized FromBlock for that lookup.

Also applies to: 618-622

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@node/blocktag/service.go` around lines 391 - 405, The tracker only advances
via tracker.Advance, causing fetchCommitBatchTxHash to miss events if batchIndex
is rewound by findCompletedBatchForL2BlockWithDepth; modify the logic that
updates the tracker so when the cached batch index moves backwards you call the
tracker's reset method (e.g., tracker.Reset or set its FromBlock to zero) or
detect the rewind and bypass the optimized FromBlock by passing a nil/zero
FromBlock to s.l1Client.FilterLogs; update both the CommitBatch lookup here
(around fetchCommitBatchTxHash usage where tracker.Advance is called) and the
analogous spot mentioned (lines ~618-622) to ensure the tracker is
reset/bypassed on batch-index rewind.

Comment on lines +108 to +118
func NewBatchVerifier(ctx context.Context, cfg *Config, vt *validator.Validator, logger tmlog.Logger) (*BatchVerifier, error) {
l1Client, err := ethclient.Dial(cfg.L1.Addr)
if err != nil {
return nil, fmt.Errorf("dial l1 node error: %w", err)
}

l2EthClient, err := ethclient.Dial(cfg.L2.EthAddr)
if err != nil {
l1Client.Close()
return nil, fmt.Errorf("dial l2 eth node error: %w", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the file and the specific lines mentioned
head -n 150 node/derivation/batch_verifier.go | tail -n 50

Repository: morph-l2/morph

Length of output: 1698


🏁 Script executed:

# Check if ethclient.DialContext exists in the codebase or go-ethereum
rg "DialContext" --type go

Repository: morph-l2/morph

Length of output: 478


🏁 Script executed:

# Check the import statement and go.mod to understand which go-ethereum version is used
cat go.mod | grep ethereum

Repository: morph-l2/morph

Length of output: 96


🏁 Script executed:

# Check for any existing usage of context-aware dials in the codebase
rg "ethclient\.Dial" --type go -A 2 -B 2

Repository: morph-l2/morph

Length of output: 5606


🏁 Script executed:

# Check if there are any context-aware patterns used elsewhere in batch_verifier.go
grep -n "context\|ctx\|Context" node/derivation/batch_verifier.go | head -20

Repository: morph-l2/morph

Length of output: 1796


🏁 Script executed:

# Verify ethclient is imported from go-ethereum
grep -n "import" node/derivation/batch_verifier.go | head -5

Repository: morph-l2/morph

Length of output: 68


🏁 Script executed:

# Get the full imports section of batch_verifier.go
sed -n '3,30p' node/derivation/batch_verifier.go

Repository: morph-l2/morph

Length of output: 941


Use context-aware dials for L1 and L2 connections in NewBatchVerifier.

Lines 109 and 114 accept a caller context but ignore it when dialing L1 and L2 endpoints. A slow or blackholed RPC endpoint can block NewBatchVerifier indefinitely, preventing context cancellation from taking effect even though the later geth-config fetch at line 154 is context-aware.

Suggested fix
-	l1Client, err := ethclient.Dial(cfg.L1.Addr)
+	l1Client, err := ethclient.DialContext(ctx, cfg.L1.Addr)
 	if err != nil {
 		return nil, fmt.Errorf("dial l1 node error: %w", err)
 	}

-	l2EthClient, err := ethclient.Dial(cfg.L2.EthAddr)
+	l2EthClient, err := ethclient.DialContext(ctx, cfg.L2.EthAddr)
 	if err != nil {
 		l1Client.Close()
 		return nil, fmt.Errorf("dial l2 eth node error: %w", err)
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@node/derivation/batch_verifier.go` around lines 108 - 118, NewBatchVerifier
currently uses ethclient.Dial which ignores the caller context; change both L1
and L2 dials to context-aware calls (use ethclient.DialContext with the provided
ctx or equivalent context-aware RPC dial) so the function respects
cancellation/timeouts, and keep the existing cleanup logic (ensure
l1Client.Close() is still called if the L2 dial fails). Update the L1 dial for
l1Client and the L2 dial for l2EthClient to use ctx and propagate the wrapped
error messages as before.

Comment on lines 67 to 89
// FetchGethConfigWithRetry fetches geth config with retry, waiting for geth to be ready.
func FetchGethConfigWithRetry(rpcURL string, logger tmlog.Logger) (*GethConfig, error) {
// The call blocks until geth responds, ctx is canceled, or the retry limit is reached.
func FetchGethConfigWithRetry(ctx context.Context, rpcURL string, logger tmlog.Logger) (*GethConfig, error) {
var lastErr error
for i := 0; i < GethRetryAttempts; i++ {
select {
case <-ctx.Done():
return nil, ctx.Err()
default:
}
config, err := FetchGethConfig(rpcURL, logger)
if err == nil {
return config, nil
}
lastErr = err
logger.Info("Waiting for geth to be ready...", "attempt", i+1, "error", err)
time.Sleep(GethRetryInterval)
select {
case <-ctx.Done():
return nil, ctx.Err()
case <-time.After(GethRetryInterval):
}
}
return nil, fmt.Errorf("geth not ready after %d attempts: %w", GethRetryAttempts, lastErr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -name "retryable_client.go" -type f

Repository: morph-l2/morph

Length of output: 90


🏁 Script executed:

wc -l ./node/types/retryable_client.go

Repository: morph-l2/morph

Length of output: 94


🏁 Script executed:

sed -n '67,89p' ./node/types/retryable_client.go

Repository: morph-l2/morph

Length of output: 862


🏁 Script executed:

rg "func FetchGethConfig\(" ./node/types/retryable_client.go -A 20

Repository: morph-l2/morph

Length of output: 704


🏁 Script executed:

head -30 ./node/types/retryable_client.go

Repository: morph-l2/morph

Length of output: 870


🏁 Script executed:

rg "rpc\.(Dial|Call)" ./node/types/retryable_client.go -B 2 -A 2

Repository: morph-l2/morph

Length of output: 319


🏁 Script executed:

rg "DialContext|CallContext" ./node/types/retryable_client.go

Repository: morph-l2/morph

Length of output: 40


🏁 Script executed:

find . -path "*/go-ethereum/rpc*" -type f | head -20

Repository: morph-l2/morph

Length of output: 40


🏁 Script executed:

grep "go-ethereum" ./go.mod

Repository: morph-l2/morph

Length of output: 99


🏁 Script executed:

find . -name "go.mod" -type f | head -5

Repository: morph-l2/morph

Length of output: 151


🏁 Script executed:

grep -i "go-ethereum\|rpc\|backoff" ./node/go.mod | head -20

Repository: morph-l2/morph

Length of output: 356


🏁 Script executed:

web_search
Does github.com/morph-l2/go-ethereum rpc.Client support DialContext and CallContext methods?

Repository: morph-l2/morph

Length of output: 151


🌐 Web query:

go-ethereum rpc.Client DialContext CallContext methods

💡 Result:

In go-ethereum’s rpc package:

  • Connect (with context for initial dial only):

    • func DialContext(ctx context.Context, rawurl string) (*rpc.Client, error) — like Dial, but ctx only cancels/times out connection establishment; it does not affect later RPC calls. The client will auto-reconnect if the connection drops. rawurl can be http(s)://, ws(s)://, or an IPC path. [1]
  • Make a JSON-RPC request (with per-call cancel/deadline):

    • func (c *Client) CallContext(ctx context.Context, result interface{}, method string, args ...interface{}) error — performs the JSON-RPC call; if ctx is canceled before it returns, it returns immediately. result must be a pointer (or nil to ignore the result). [1]

Minimal example:

c, err := rpc.DialContext(ctx, "https://rpc.example.com")
if err != nil { /* handle */ }
defer c.Close()

var block hexutil.Big
err = c.CallContext(ctx, &block, "eth_blockNumber")

If you want a higher-level Ethereum API wrapper, ethclient.Client is built on top of the same RPC transport and provides typed methods (e.g., BlockNumber wrapping eth_blockNumber). [2]

Sources: go-ethereum rpc package docs [1], Geth “Go API” docs [2].


🏁 Script executed:

rg "FetchGethConfig" ./node/types/retryable_client.go

Repository: morph-l2/morph

Length of output: 541


🏁 Script executed:

rg "FetchGethConfig\(" ./node/types/retryable_client.go -B 2 -A 2

Repository: morph-l2/morph

Length of output: 364


Propagate ctx into the eth_config RPC call to respect cancellation.

Line 77 calls FetchGethConfig(rpcURL, logger) without passing the context, and FetchGethConfig uses rpc.Dial() and client.Call() instead of their context-aware variants. If the connection or RPC call blocks (e.g., network timeout, unresponsive geth), the context cancellation checks in lines 73-75 and 86-89 cannot interrupt the underlying blocking operations, defeating the startup/shutdown timeout behavior.

Update FetchGethConfig to accept context.Context and use rpc.DialContext() and client.CallContext(), then pass ctx from FetchGethConfigWithRetry (line 77).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@node/types/retryable_client.go` around lines 67 - 89,
FetchGethConfigWithRetry currently calls FetchGethConfig(rpcURL, logger) without
a context, so long-running rpc.Dial()/client.Call() calls can't be canceled;
change FetchGethConfig signature to accept ctx context.Context, update its
implementation to use rpc.DialContext(ctx, ...) and client.CallContext(ctx, ...)
(or the context-aware equivalents), and then pass the ctx from
FetchGethConfigWithRetry into FetchGethConfig(ctx, rpcURL, logger) so
cancellation/timeout in the retry loop actually aborts the underlying RPC
operations.

Introduce a new BatchProcessor that sequentially walks every committed
batch, verifies via BatchVerifier, and updates safe/finalized block tags.
This replaces BlockTagService which skipped intermediate batches.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
node/batchprocessor/config.go (1)

33-48: Consider validating L1Addr is not empty.

SetCliContext retrieves L1Addr from the CLI flag but doesn't validate it. If the flag is unset or empty, the config will pass through here successfully but fail later in NewBatchProcessor (line 67-68 of processor.go). For consistency with the rollup address validation, consider adding an empty check here.

♻️ Optional: Add L1Addr validation
 func (c *Config) SetCliContext(ctx *cli.Context) error {
 	c.L1Addr = ctx.GlobalString(flags.L1NodeAddr.Name)
+	if c.L1Addr == "" {
+		return fmt.Errorf("L1 node address is required: specify --%s", flags.L1NodeAddr.Name)
+	}
 
 	if ctx.GlobalBool(flags.MainnetFlag.Name) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@node/batchprocessor/config.go` around lines 33 - 48, SetCliContext currently
assigns c.L1Addr from the CLI but doesn't validate it; add a check in
Config.SetCliContext after c.L1Addr = ctx.GlobalString(flags.L1NodeAddr.Name) to
ensure c.L1Addr is not empty (zero-length) and return a descriptive error if it
is (similar style to the RollupAddress error). Reference the
Config.SetCliContext function and flags.L1NodeAddr.Name; this prevents later
failures in NewBatchProcessor by failing fast with a clear message.
node/derivation/README.md (1)

22-32: Add language specifiers to fenced code blocks.

Static analysis flagged missing language specifiers on the architecture diagram code blocks. Adding text or plaintext improves accessibility and tooling support.

📝 Proposed fix
-```
+```text
 BatchProcessor (scheduler, node/batchprocessor)

Apply the same change to the code block starting at line 154.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@node/derivation/README.md` around lines 22 - 32, The fenced diagram code
blocks in the README (the block that begins with "BatchProcessor (scheduler,
node/batchprocessor)" and the later block that includes "BatchVerifier
(stateless, no goroutines, node/derivation)") are missing language specifiers;
update each opening fence from ``` to ```text (or ```plaintext) so the blocks
read e.g. ```text BatchProcessor (scheduler, node/batchprocessor) and similarly
for the BatchVerifier block to improve tooling and accessibility.
node/batchprocessor/processor.go (1)

285-290: Address TODO: define verification failure behavior.

The TODO comment indicates undefined behavior on verification failure. Currently, the error propagates and stops cursor advancement. Consider whether:

  1. The current behavior (stop advancing) is correct for safety
  2. A challenge should be triggered (as mentioned in README for steps 1-2)
  3. Metrics/alerts should be emitted

Would you like help defining the failure handling strategy or opening an issue to track this?

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@node/batchprocessor/processor.go` around lines 285 - 290, The verification
failure TODO indicates undefined failure handling for bp.verifyBatch when
bp.batchVerifier != nil; implement a clear strategy rather than just returning
an error: add a handler (e.g., method handleVerificationFailure on the
processor) and call it from the branch that catches err from
verifyBatch(batchIndex, currentL1); the handler should encapsulate chosen
behavior — either stop advancing the cursor and return the error for safety, or
trigger a challenge via bp.challenger.TriggerChallenge(batchIndex, err) and emit
metrics/alerts via bp.metrics (e.g., IncrementVerificationFailures and emit an
alert event) — ensure only successful verification advances the cursor, and
update tests to cover both success and failure flows.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@node/batchprocessor/processor.go`:
- Around line 325-351: The fetchCommitBatchTxHash function currently may Scan
from genesis when bp.lastCommitL1Block == 0; update initCursors to seed
bp.lastCommitL1Block instead of leaving it zero (or set a bounded fallback
lookback) by locating the CommitBatch event for the finalized batch: after
resolving the finalized L2 block in initCursors, query FilterLogs (or use
indexed event lookup) for the CommitBatch with batch index == finalized index
and set bp.lastCommitL1Block to that event's BlockNumber; alternatively, if that
lookup fails, set bp.lastCommitL1Block = max(0, currentL1 - LOOKBACK_BLOCKS) so
fetchCommitBatchTxHash never searches from block 0.

---

Nitpick comments:
In `@node/batchprocessor/config.go`:
- Around line 33-48: SetCliContext currently assigns c.L1Addr from the CLI but
doesn't validate it; add a check in Config.SetCliContext after c.L1Addr =
ctx.GlobalString(flags.L1NodeAddr.Name) to ensure c.L1Addr is not empty
(zero-length) and return a descriptive error if it is (similar style to the
RollupAddress error). Reference the Config.SetCliContext function and
flags.L1NodeAddr.Name; this prevents later failures in NewBatchProcessor by
failing fast with a clear message.

In `@node/batchprocessor/processor.go`:
- Around line 285-290: The verification failure TODO indicates undefined failure
handling for bp.verifyBatch when bp.batchVerifier != nil; implement a clear
strategy rather than just returning an error: add a handler (e.g., method
handleVerificationFailure on the processor) and call it from the branch that
catches err from verifyBatch(batchIndex, currentL1); the handler should
encapsulate chosen behavior — either stop advancing the cursor and return the
error for safety, or trigger a challenge via
bp.challenger.TriggerChallenge(batchIndex, err) and emit metrics/alerts via
bp.metrics (e.g., IncrementVerificationFailures and emit an alert event) —
ensure only successful verification advances the cursor, and update tests to
cover both success and failure flows.

In `@node/derivation/README.md`:
- Around line 22-32: The fenced diagram code blocks in the README (the block
that begins with "BatchProcessor (scheduler, node/batchprocessor)" and the later
block that includes "BatchVerifier (stateless, no goroutines, node/derivation)")
are missing language specifiers; update each opening fence from ``` to ```text
(or ```plaintext) so the blocks read e.g. ```text BatchProcessor (scheduler,
node/batchprocessor) and similarly for the BatchVerifier block to improve
tooling and accessibility.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5ce2c425-7499-4309-ae4d-0222afdd5b06

📥 Commits

Reviewing files that changed from the base of the PR and between faaa516 and 090a62b.

📒 Files selected for processing (4)
  • node/batchprocessor/config.go
  • node/batchprocessor/processor.go
  • node/cmd/node/main.go
  • node/derivation/README.md

Comment on lines +325 to +351
// fetchCommitBatchTxHash finds the CommitBatch event on L1 for a given batch
// index. It searches forward from lastCommitL1Block (the L1 block of the
// previous CommitBatch event) so the window naturally covers catch-up scenarios
// instead of being limited to a fixed range.
func (bp *BatchProcessor) fetchCommitBatchTxHash(batchIndex, currentL1 uint64) (common.Hash, error) {
fromBlock := bp.lastCommitL1Block

batchIndexBig := new(big.Int).SetUint64(batchIndex)
logs, err := bp.l1Client.FilterLogs(bp.ctx, ethereum.FilterQuery{
FromBlock: new(big.Int).SetUint64(fromBlock),
ToBlock: new(big.Int).SetUint64(currentL1),
Addresses: []common.Address{bp.rollupAddress},
Topics: [][]common.Hash{
{derivation.RollupEventTopicHash},
{common.BigToHash(batchIndexBig)},
},
})
if err != nil {
return common.Hash{}, fmt.Errorf("FilterLogs: %w", err)
}
if len(logs) == 0 {
return common.Hash{}, fmt.Errorf("no CommitBatch event found for batch %d in L1 blocks [%d, %d]", batchIndex, fromBlock, currentL1)
}

bp.lastCommitL1Block = logs[0].BlockNumber
return logs[0].TxHash, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Cold start: searching from block 0 may be slow.

When lastCommitL1Block is 0 (initial state), fetchCommitBatchTxHash searches from L1 genesis to current head. On mainnet with millions of blocks, this FilterLogs call can time out or be rate-limited.

Consider initializing lastCommitL1Block in initCursors by querying the L1 block of the finalized batch's CommitBatch event, or using a bounded lookback window as a fallback.

🔧 Sketch: Initialize lastCommitL1Block in initCursors
// After resolving the L2 block hash in initCursors, also resolve the L1 block
// of the CommitBatch event for the finalized batch to seed lastCommitL1Block.
if idx > 0 {
    // Find the CommitBatch event for idx to seed lastCommitL1Block
    // This avoids searching from genesis on first tick
}

Alternatively, use a fixed lookback (e.g., 10000 blocks) when lastCommitL1Block == 0.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@node/batchprocessor/processor.go` around lines 325 - 351, The
fetchCommitBatchTxHash function currently may Scan from genesis when
bp.lastCommitL1Block == 0; update initCursors to seed bp.lastCommitL1Block
instead of leaving it zero (or set a bounded fallback lookback) by locating the
CommitBatch event for the finalized batch: after resolving the finalized L2
block in initCursors, query FilterLogs (or use indexed event lookup) for the
CommitBatch with batch index == finalized index and set bp.lastCommitL1Block to
that event's BlockNumber; alternatively, if that lookup fails, set
bp.lastCommitL1Block = max(0, currentL1 - LOOKBACK_BLOCKS) so
fetchCommitBatchTxHash never searches from block 0.

@curryxbo curryxbo force-pushed the refactor/batch-verifier branch from 090a62b to 2ce7470 Compare March 9, 2026 14:24
@curryxbo curryxbo closed this Mar 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Integrate L2 block tags with derive block operations

1 participant