Skip to content

feat: emit balance_verified via multi-provider stateRoot quorum + MPT proof#76

Open
smypmsa wants to merge 4 commits intomasterfrom
feat/balance-verified-rpc-mpt
Open

feat: emit balance_verified via multi-provider stateRoot quorum + MPT proof#76
smypmsa wants to merge 4 commits intomasterfrom
feat/balance-verified-rpc-mpt

Conversation

@smypmsa
Copy link
Copy Markdown
Member

@smypmsa smypmsa commented May 3, 2026

Summary

New verify_state Vercel cron (fra1 only, every 15 min, offset 5 min from update_state) that produces a second, independent answer to eth_getBalance for each EVM chain.

Per round, per chain (Ethereum, Base, Arbitrum, BNB):

  1. Read OLD_BLOCK from shared blob storage.
  2. Multi-provider stateRoot quorum at OLD_BLOCK — every fra1 RPC provider must respond AND return the same stateRoot, otherwise verifier_status=1 (disagreement) or =3 (partial response).
  3. Fetch eth_getProof from Chainstack for the chain's probe address.
  4. Walk the MPT proof locally against the agreed stateRoot. Path-mismatch / empty-branch returns None (canonical exclusion); hash-mismatch raises ProofError.
  5. On full success: emit balance_verified (hashed via v1's common/balance_hash.py) + verifier_status=0.

Trust narrative

  • Provider being measured: not trusted (audit target).
  • Provider serving the proof (Chainstack): not trusted; MPT math catches forged proofs regardless of who served them.
  • Anchor stateRoot: trusted only when all fra1 providers agreed.
  • Validator-level corruption: out of scope for v2 (v3's beacon-API anchor closes that gap).

Empirical validation done while writing the spec

  • ✅ Walker correctness: 4 chains × 200+ proofs each (800 total). Walker output equals eth_getBalance byte-for-byte on every probe. 0 unexpected failures.
  • ✅ Tamper detection: bit flip / wrong root / truncation → all rejected (verifier_status=2).
  • ✅ Inlined nodes: 0/400 in account proofs (theory: leaves >32B, so inlining of branch/extension children empirically absent in account-proof paths).
  • ✅ Multi-provider stateRoot agreement: 0/24 disagreements across 4 chains × 6 block depths × 3 operators.
  • ✅ Latency: 0.5–6.3s per chain on public RPCs (paid endpoints expected faster). Comfortable margin against 55s Vercel limit.
  • ✅ Bundle size: +27 MB delta. Vercel Python limit 500 MB unzipped → 415 MB headroom.
  • ✅ End-to-end smoke test on this branch: walker against Ethereum public RPC matches eth_getBalance (1722166635215000 wei).

Excluded from v2

chain reason
Hyperliquid No historical block parameter on eth_getBalance (only latest).
Monad RPC implementation does not expose eth_getProof-32601 Method not found confirmed against Monad/0.14.2 testnet.
Solana, TON Different state model and proof systems.

Their dashboards keep using v1's modal-based comparison.

Files

file role LoC
api/support/verify_state.py HTTP handler + orchestration ~270
common/verify/proof.py MPT walker (split into helpers for mccabe ≤ 10) ~210
common/verify/anchor.py multi-provider quorum + Chainstack proof fetch ~150
common/verify/providers.py endpoint enumeration (no Chainstack-only filter) ~70
common/verify/__init__.py package exports 25
vercel.json + vercel.fra1.json function config + cron entry +6 lines each
pyproject.toml + requirements.txt rlp==4.1.0 + pycryptodome==3.21.0 +2 each

Total: ~720 LoC of new Python + small config edits.

NOT in this PR (intentional split)

Dashboard panel updates. The Data agreement panels we shipped in PR #75 keep using their existing modal-based query; the new balance_verified series flows in alongside, queryable but not yet displayed by panels. Panel swap to use balance_verified lives in a separate follow-up PR after this verifier soaks for ~1 week and we confirm verifier_status is healthy.

Rationale: keep concerns separate. Verifier PR is Vercel-function code + cron + new metric series. Dashboard PR is JSON in dashboards/ synced via grafana_sync.py. They're independent — verifier emits regardless of whether panels query, and existing modal panels keep working until the swap lands.

Test plan

  • uvx black . clean
  • uvx ruff check . clean
  • uvx mypy . 238 errors (= master baseline, no new errors from this PR)
  • End-to-end smoke test of the package on Ethereum public RPC — walker matches eth_getBalance
  • Spec walker validation: 800 real proofs across 4 chains, 0 unexpected failures
  • Pre-merge check feat: fetch and store recent blockchain data #1: eth_getProof on each Chainstack production endpoint (Ethereum / Base / Arbitrum / BNB). Single curl per chain. Most likely to surface: base-rpc.publicnode.com-style "tight proof window" — production Chainstack archive should not have this.
  • Pre-merge check Replace hardcoded parameters with dynamic blockchain state #2: Vercel preview bundle size via vercel inspect <preview-deployment>. Local measurement says ~27 MB delta; expect under 500 MB Python limit.
  • Post-merge observation: deploy fra1, confirm dev_balance_verified series appears for all 4 chains within 30 min (~2 cron rounds). Confirm dev_verifier_status=0 in steady state.

Cron + region gating

  • Cron: 5-59/15 * * * * → minutes 5, 20, 35, 50.
  • update_state stays on */15 → minutes 0, 15, 30, 45.
  • 5-minute lag guarantees the verifier reads a fully-written OLD_BLOCK from blob.
  • Cron entry registered ONLY in vercel.fra1.json (sin1 / sfo1 / hnd1 don't get it).
  • Defense-in-depth env gate: VERCEL_REGION == "fra1" short-circuits if a non-fra1 deploy somehow registered the cron.
  • Auth: Bearer ${CRON_SECRET} matching update_state.py.

What v3 will be (separate spec, separate PR)

v3 swaps the anchor source from "multi-RPC-provider stateRoot quorum" to "multi-beacon-API stateRoot quorum" — closes the residual blind spot of "all RPC providers share an upstream wrong stateRoot". Walker, schema, panels, cron, region gate all carry verbatim. ~150 LoC of anchor.py rewrite + beacon_providers.py. Spec file: spec-verified-correctness-v3.md.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added a blockchain state verification service that checks per-chain account state across multiple RPC providers, performs local proof verification, and emits monitoring metrics (balance_verified and verifier_status).
    • Published verification utilities for provider discovery, anchor agreement, and account-proof verification for reuse.
  • Chores

    • Pinned cryptography and RLP runtime dependencies.
    • Updated deployment and cron configuration for the new verification endpoint.

… proof

New Vercel cron (fra1 only, every 15 min, offset 5 min from update_state)
that produces a second, independent answer to eth_getBalance for each
EVM chain we monitor.

Per round per chain (Ethereum, Base, Arbitrum, BNB):
  1. Read OLD_BLOCK from shared blob storage (set by update_state).
  2. Multi-provider stateRoot quorum at OLD_BLOCK — every fra1 RPC
     provider must respond AND return the same stateRoot, otherwise
     emit verifier_status=1 (disagreement) or =3 (partial response).
  3. Fetch eth_getProof from Chainstack for the chain's probe address.
  4. Walk the MPT proof locally against the agreed stateRoot
     (rlp + pycryptodome only). Path-mismatch / empty-branch returns
     None for canonical exclusion; hash-mismatch raises ProofError.
  5. On full success: emit balance_verified (hashed to 52-bit float
     using v1's common/balance_hash.py) + verifier_status=0.

The MPT walker was implemented per the spec algorithm and validated
end-to-end against four chains (800 proofs walked, 0 unexpected
failures, all balances match eth_getBalance) plus tamper resistance
(bit flip, wrong root, truncation all rejected). See
spec-verified-correctness-v2.md (local design doc) for the full
algorithm and empirical-validation table.

Trust narrative:
  - Provider being measured: not trusted (audit target)
  - Provider serving the proof (Chainstack): not trusted; MPT math
    catches forged proofs regardless of who served them
  - Anchor stateRoot: trusted only when ALL fra1 providers agreed
  - Validator-level corruption: out of scope for v2 (v3's beacon-
    anchor closes that gap)

Excluded from v2:
  - Hyperliquid: no historical block parameter on eth_getBalance
  - Monad: RPC implementation does not expose eth_getProof
    (-32601 Method not found, empirically confirmed)
  - Solana / TON: different state model, separate spec

Files:
  - api/support/verify_state.py: HTTP handler + orchestration
  - common/verify/proof.py: MPT walker (~150 LoC, hand-checked)
  - common/verify/anchor.py: multi-provider quorum + proof fetch
  - common/verify/providers.py: endpoint enumeration
  - common/verify/__init__.py: package exports
  - vercel.json + vercel.fra1.json: function entry + cron schedule
    (only fra1 deploys this)
  - pyproject.toml + requirements.txt: rlp + pycryptodome deps
    (+27 MB bundle delta; well under Vercel Python's 500 MB limit)

NOT in this PR (intentional split):
  - Dashboard panel updates. The Data agreement panels keep using
    their existing modal-based query; the new balance_verified
    series flows alongside, queryable but not yet displayed.
    Panel swap is a separate follow-up PR after observation soak.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 3, 2026

Warning

Rate limit exceeded

@smypmsa has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 42 minutes and 39 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8720921a-29b0-4c74-87e1-f20df4f2ba10

📥 Commits

Reviewing files that changed from the base of the PR and between c4d8a5f and ceb4846.

📒 Files selected for processing (1)
  • api/support/verify_state.py
📝 Walkthrough

Walkthrough

Adds a Vercel cron HTTP handler and verification library that fetches per-chain block roots from multiple RPC providers, obtains Chainstack account proofs, locally verifies MPT account proofs to extract balances, emits Influx line-protocol metrics, and optionally pushes them to Grafana Cloud.

Changes

Blockchain State Verification System

Layer / File(s) Summary
Dependencies
pyproject.toml, requirements.txt
Pinned runtime deps: rlp==4.1.0, pycryptodome==3.21.0.
Provider Configuration
common/verify/providers.py
Loads ENDPOINTS JSON; exposes all_providers_for(chain) and chainstack_endpoint_for(chain) to enumerate RPC/Chainstack endpoints.
Multi-Provider StateRoot Quorum
common/verify/anchor.py
Implements _fetch_state_root, fetch_agreed_anchor (concurrent provider queries, partial/disagreement errors), and fetch_account_proof (Chainstack eth_getProof + node bytes conversion).
MPT Proof Verification
common/verify/proof.py
Implements verify_account_proof with keccak256, HP path decoding, branch/extension walkers, RLP account decoding; returns canonical balance, None for canonical exclusions, or raises ProofError.
Verification Library Exports
common/verify/__init__.py
Re-exports anchor/proof/provider helpers and error types via __all__.
Orchestration & HTTP Cron Handler
api/support/verify_state.py
Adds constants (regions, probes, metric name/status codes), _gather_state_data, _verify_chain, _verify_all, Influx line-format helpers, optional _push_to_grafana, and Vercel handler with auth/region gating and event-loop orchestration.
Deployment / Scheduling
vercel.json, vercel.fra1.json
Registers api/support/verify_state.py function (memory:200, maxDuration:59) and adds cron schedule "5-59/15 * * * *".
sequenceDiagram
    participant Cron as Vercel Cron
    participant Handler as verify_state.handler
    participant BlockState as BlockchainState
    participant RPC as RPC Providers
    participant Chainstack as Chainstack Endpoint
    participant Verifier as Local MPT Verifier
    participant Grafana as Grafana Cloud

    Cron->>Handler: GET /api/support/verify_state (Bearer token)
    Handler->>Handler: Authenticate & region check
    Handler->>BlockState: get_data(chain) for each probe
    BlockState-->>Handler: old_block per chain

    Handler->>RPC: fetch_agreed_anchor(block) (concurrent)
    RPC-->>Handler: agreed stateRoot or error

    Handler->>Chainstack: fetch_account_proof(address, block)
    Chainstack-->>Handler: accountProof (MPT nodes)

    Handler->>Verifier: verify_account_proof(address, accountProof, stateRoot)
    Verifier-->>Handler: balance or ProofError

    Handler->>Handler: format Influx lines (balance_verified, verifier_status)
    Handler->>Grafana: POST metrics text (optional)
    Grafana-->>Handler: 200/204

    Handler-->>Cron: 200 + emitted metrics text
Loading

🎯 4 (Complex) | ⏱️ ~60 minutes

🐇 I nibble stateRoots under moonlit skies,
I fetch the proofs where hidden data lies,
I hop through nibbles, decode each leaf,
I count carrot balances, report their brief,
I send my metrics, then bound off in white.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the main change: adding balance verification using multi-provider stateRoot quorum and MPT proof verification, which aligns with the core purpose of the 294-line verify_state.py handler and supporting modules.
Docstring Coverage ✅ Passed Docstring coverage is 95.45% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/balance-verified-rpc-mpt

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 42 minutes and 39 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/support/verify_state.py`:
- Around line 100-102: The current verify_state branch conflates missing anchor
providers and missing proof endpoint by returning STATUS_ANCHOR_UNAVAILABLE for
both; change the logic so if providers is empty (or falsy) you return
_format_verifier_status_line(chain, STATUS_ANCHOR_UNAVAILABLE, ts_ns), but if
providers exist and chainstack_url (from chainstack_endpoint_for()) is falsy you
instead return _format_verifier_status_line(chain, STATUS_PROOF_UNAVAILABLE,
ts_ns); update the conditional in verify_state to check providers first and then
chainstack_url so the correct STATUS_* constant is used.
- Around line 171-176: In the loop over results in verify_state.py (the block
that iterates "for r in results:" and currently only logs exceptions), append a
failure-status line into the "lines" list whenever a chain task raised instead
of dropping the sample; include identifying info (chain id/name or index), a
clear failure marker like "FAIL" or "ERROR", and the exception
message/timestamp, then return the joined lines as before; alternatively
implement the same mapping inside _verify_chain() so each task returns either
its normal list of lines or a single-line failure entry to preserve
verifier_status continuity.
- Around line 87-90: The code currently returns [] when chain_state is missing
or has no "old_block", which drops the chain from the round; instead, call the
existing verifier status emission routine to record a failure for this chain
before returning. In the verify_state block where you check chain_state =
state_data.get(chain_lower) and if not chain_state or not
chain_state.get("old_block"), invoke the project's verifier_status emitter (the
same function used elsewhere to report per-chain results) with the chain
identifier, round/epoch context (e.g., round_id or task id available in scope),
and a clear failure reason like "missing_old_block" or the underlying blob-read
error from _gather_state_data(), then return [] as before. Ensure the emitted
status includes enough context (chain, round, and reason) so dashboards can
distinguish "verifier failed" from "no sample."

In `@common/verify/anchor.py`:
- Around line 106-112: The code decodes block.get("stateRoot") into bytes but
doesn't reject values of the wrong byte length, allowing 31- or 33-byte values
to pass quorum and later fail as STATUS_PROOF_MATH_INVALID; update the logic
around state_root_hex and bytes.fromhex to validate size and reject non-32-byte
results by returning None. Specifically, in the block handling code that reads
state_root_hex (the block.get("stateRoot") path and the bytes.fromhex(...)
call), normalize the hex with removeprefix("0x"), ensure it's an even-length hex
string, decode it, and then check that len(state_root_bytes) == 32; if any check
fails (invalid hex or wrong length) return None so incorrect lengths are treated
as anchor-unavailable.
- Around line 157-167: The code currently filters non-string items out of
account_proof and decodes remaining items, which silently drops malformed
elements; update the logic in the block that handles account_proof (the variable
and the list comprehension using bytes.fromhex and .removeprefix) to first
validate that account_proof is a list of strings and that every element starts
with "0x" and contains valid hex characters, and if any element fails these
checks raise a RuntimeError describing the malformed RPC response instead of
silently skipping entries; perform the validation loop before attempting
bytes.fromhex so errors surface as RPC malformation (include the offending
element in the error message).

In `@common/verify/proof.py`:
- Around line 73-95: The _follow_reference helper currently rejects any
non-empty, non-32-byte bytes refs, but valid MPT proofs may inline child nodes
as raw RLP (<32 bytes); update _follow_reference to, when ref is bytes and
length != 32, treat it as an embedded RLP node and return rlp.decode(ref)
(keeping the existing behavior of raising ProofError when a 32-byte hash is
missing from proof_by_hash and still raising on completely unexpected types);
reference symbols: _follow_reference, proof_by_hash, rlp.decode, ProofError.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6941e242-c09c-4d44-802a-f40b2e3a154b

📥 Commits

Reviewing files that changed from the base of the PR and between 0447423 and 7dd4420.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (9)
  • api/support/verify_state.py
  • common/verify/__init__.py
  • common/verify/anchor.py
  • common/verify/proof.py
  • common/verify/providers.py
  • pyproject.toml
  • requirements.txt
  • vercel.fra1.json
  • vercel.json

Comment thread api/support/verify_state.py Outdated
Comment thread api/support/verify_state.py Outdated
Comment thread api/support/verify_state.py Outdated
Comment thread common/verify/anchor.py
Comment thread common/verify/anchor.py
Comment thread common/verify/proof.py
Comment on lines +73 to +95
def _follow_reference(ref: Any, proof_by_hash: dict[bytes, bytes]) -> Any:
"""Resolve a child reference to the next decoded node.

A child reference is either:
- A list (already-decoded RLP): an inlined node — used directly. The parent's
hash already authenticated it, so no separate hash check is required.
- 32 raw bytes: a hash reference — looked up in ``proof_by_hash``.

Empty refs (``b""``) signal exclusion at the caller and must not be passed
here.

Raises:
ProofError: If the reference is malformed or its hash isn't in the proof.
"""
if isinstance(ref, list):
return ref
if isinstance(ref, bytes):
if len(ref) == 32:
if ref not in proof_by_hash:
raise ProofError(f"hash {ref.hex()} not present in accountProof")
return rlp.decode(proof_by_hash[ref])
raise ProofError(f"reference length {len(ref)} is invalid (must be 0 or 32)")
raise ProofError(f"unexpected reference type: {type(ref).__name__}")
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot May 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Decode embedded child nodes instead of rejecting them.

MPT child references are not limited to b"" and 32-byte hashes. When the child node RLP is < 32 bytes, the parent stores that raw RLP inline. This helper currently raises on every non-empty, non-32-byte bytes ref, so valid proofs will fail as soon as traversal hits an embedded node.

Suggested fix
 def _follow_reference(ref: Any, proof_by_hash: dict[bytes, bytes]) -> Any:
     """Resolve a child reference to the next decoded node.
@@
     if isinstance(ref, list):
         return ref
     if isinstance(ref, bytes):
         if len(ref) == 32:
             if ref not in proof_by_hash:
                 raise ProofError(f"hash {ref.hex()} not present in accountProof")
             return rlp.decode(proof_by_hash[ref])
-        raise ProofError(f"reference length {len(ref)} is invalid (must be 0 or 32)")
+        if len(ref) == 0:
+            raise ProofError("empty child reference must be handled by the caller")
+        try:
+            return rlp.decode(ref)
+        except Exception as exc:
+            raise ProofError("embedded child reference is not valid RLP") from exc
     raise ProofError(f"unexpected reference type: {type(ref).__name__}")
🧰 Tools
🪛 Ruff (0.15.12)

[warning] 73-73: Dynamically typed expressions (typing.Any) are disallowed in ref

(ANN401)


[warning] 73-73: Dynamically typed expressions (typing.Any) are disallowed in _follow_reference

(ANN401)


[warning] 92-92: Avoid specifying long messages outside the exception class

(TRY003)


[warning] 94-94: Avoid specifying long messages outside the exception class

(TRY003)


[warning] 95-95: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@common/verify/proof.py` around lines 73 - 95, The _follow_reference helper
currently rejects any non-empty, non-32-byte bytes refs, but valid MPT proofs
may inline child nodes as raw RLP (<32 bytes); update _follow_reference to, when
ref is bytes and length != 32, treat it as an embedded RLP node and return
rlp.decode(ref) (keeping the existing behavior of raising ProofError when a
32-byte hash is missing from proof_by_hash and still raising on completely
unexpected types); reference symbols: _follow_reference, proof_by_hash,
rlp.decode, ProofError.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a bug: rlp.decode is recursive so inlined children come back as Python lists and are handled by the isinstance(ref, list) branch above; verified by encoding and decoding a parent with a 22-byte inlined child.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smypmsa, you're absolutely right — thank you for the clear explanation. Since rlp.decode is recursive, any inlined child node (RLP-encoded bytes < 32 bytes embedded in the parent) is already expanded into a nested Python list at the point the parent is decoded. By the time _follow_reference is called, those children arrive as list objects and are handled by the isinstance(ref, list) branch. The bytes path in _follow_reference is only ever reached by a genuine 32-byte hash reference. The current code is correct.

I'll record this to avoid raising the same false positive in future reviews.

( ˘ ³˘)♡


✏️ Learnings added
Learnt from: smypmsa
Repo: chainstacklabs/compare-dashboard-functions PR: 76
File: common/verify/proof.py:73-95
Timestamp: 2026-05-03T13:50:08.443Z
Learning: In common/verify/proof.py, the `_follow_reference` function correctly handles only two cases for `bytes` refs: empty bytes (exclusion, handled by caller) and 32-byte hashes (looked up in proof_by_hash). Inlined MPT child nodes (raw RLP < 32 bytes) never appear as raw `bytes` in `_follow_reference` because `rlp.decode` is recursive — it already expands embedded children into nested Python `list` objects when the parent node is decoded. So the `isinstance(ref, list)` branch above handles them. Do NOT suggest adding `rlp.decode(ref)` for the < 32 byte case; it is not needed and was empirically verified with a 22-byte inlined child.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: smypmsa
Repo: chainstacklabs/compare-dashboard-functions PR: 59
File: dashboards/grafana_sync.py:27-27
Timestamp: 2026-04-15T08:56:43.269Z
Learning: When reviewing Python changes in this repo, avoid asking for broad docstring expansions (e.g., adding multi-section "Args/Returns/Raises" blocks) in files that are unrelated to the PR’s feature scope. Keep one-line docstrings that already satisfy pydocstyle with the "google" convention unchanged. If wider docstring-section coverage is desired as a baseline, defer it to a dedicated docstring-sweep PR rather than expanding docstrings opportunistically during feature PR review.

Five accepted fixes; one comment dismissed as a false alarm (verified empirically).

verify_state.py:
- Missing OLD_BLOCK now emits verifier_status=3 (anchor unavailable) instead
  of returning [], so a blob/state failure is visible in dashboards rather
  than indistinguishable from a missing cron run.
- Missing endpoints split: no quorum providers -> verifier_status=3; no
  Chainstack proof endpoint -> verifier_status=4. Previously both mapped
  to status=3.
- Per-chain task that raises an unexpected exception in asyncio.gather now
  emits verifier_status=2 for that chain (instead of being silently logged
  and dropped). Prevents silent gaps in the verifier_status series.

anchor.py:
- _fetch_state_root now rejects non-32-byte stateRoot values; a malformed
  but unanimous response fails as anchor-unavailable instead of surfacing
  later as proof-math-invalid.
- fetch_account_proof now fails fast on accountProof elements that are
  not strings; previously the comprehension silently dropped them, which
  could have been misreported downstream as proof corruption.

Dismissed: the suggestion to "decode embedded child nodes" in proof.py's
_follow_reference was a false alarm. The Ethereum rlp library decodes
parents recursively, so inlined children appear as Python lists (not as
sub-32-byte bytes); the walker handles them via isinstance(ref, list).
Verified empirically by encoding+decoding a parent with a 22-byte inlined
child — decode returns a list, not bytes. Bytes refs of length 1-31 would
be malformed per the MPT spec and correctly raise.

Smoke test re-run post-fix: walker still returns the correct balance
end-to-end on Ethereum public RPC.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/support/verify_state.py`:
- Around line 283-287: The handler currently writes raw exception text back to
clients (wfile.write(str(e).encode())) which may leak internals; instead log the
exception server-side (e.g., use self.log_error or the module logger with the
exception info using the exception variable e) and send a generic 500 body like
"Internal server error" (keep using self.send_response, self.send_header,
self.end_headers, and self.wfile.write but replace the payload), removing any
direct exposure of str(e).
- Around line 248-253: The _check_auth method currently treats an empty
CRON_SECRET as a match (allowing "Authorization: Bearer "), so change logic to
fail closed: after honoring SKIP_AUTH, read CRON_SECRET into a variable (e.g.,
secret = os.getenv("CRON_SECRET", "")) and if that secret is empty return False,
then compare token == f"Bearer {secret}". Ensure you reference _check_auth,
self.headers/Authorization, SKIP_AUTH and CRON_SECRET in the update so
unauthenticated requests are rejected when CRON_SECRET is unset.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5da777c1-2a56-40e8-a347-0036586bf467

📥 Commits

Reviewing files that changed from the base of the PR and between 7dd4420 and 918eeb3.

📒 Files selected for processing (2)
  • api/support/verify_state.py
  • common/verify/anchor.py

Comment thread api/support/verify_state.py
Comment thread api/support/verify_state.py Outdated
verify_state.py:
- _check_auth fails closed when CRON_SECRET is unset; previously an empty
  Authorization header (Bearer with empty value) would have authenticated
  if the env var was missing.
- 500 responses no longer leak raw exception text; log via logging.exception
  and return a generic body.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/support/verify_state.py`:
- Around line 135-137: In verify_state.py, inside the except ProofError as e
block, replace the logging.error call with logging.exception so the ProofError
traceback is preserved (addresses Ruff TRY400); keep the same message text
("verify_state: proof math invalid for {chain}: {e}") and the subsequent return
of _format_verifier_status_line(chain, STATUS_PROOF_MATH_INVALID, ts_ns)
unchanged.
- Around line 195-198: The fallback that appends a verifier status line
currently uses STATUS_PROOF_MATH_INVALID but should use STATUS_PROOF_UNAVAILABLE
for unexpected exceptions; update the call to _format_verifier_status_line (the
lines.append(...) invocation) to replace STATUS_PROOF_MATH_INVALID with
STATUS_PROOF_UNAVAILABLE so that exceptions from _verify_chain are reported as
"unavailable" rather than cryptographic proof failures.
- Around line 231-236: The _push_to_grafana function currently treats
401/403/404/429 as retriable and emits warnings; update the response handling so
that when response.status is in (401, 403, 404, 429) the function silently
ignores the error (return immediately) without consuming retries or calling
logging.warning, while preserving existing behavior for other statuses (e.g.,
keeping the existing success check for 200/204 and the warning+retry for other
non-success codes); locate this logic around the response/status check in
_push_to_grafana (references: response, attempt, logging.warning) and add an
early return for those four status codes.
- Around line 139-145: The code maps an exclusion proof (when
verify_account_proof returns balance is None) to STATUS_PROOF_MATH_INVALID,
which mislabels a valid exclusion as a malformed proof; in the balance is None
branch inside verify_state.py replace STATUS_PROOF_MATH_INVALID with a more
accurate status — either use the existing STATUS_PROOF_UNAVAILABLE constant or
introduce a new STATUS_EXCLUSION_UNEXPECTED constant (e.g., value 5) and use
that; if you add a new constant, declare it alongside the other STATUS_*
constants and update any consumers or documentation that enumerate statuses;
keep the same return call to _format_verifier_status_line(chain, NEW_STATUS,
ts_ns).
- Around line 96-239: Replace all f-string usage inside logging calls in this
module (e.g. inside _verify_chain, _gather_state_data, _verify_all,
_push_to_grafana and any verify_state call sites) with logging's lazy formatting
or param-style logging (e.g. logging.warning("verify_state: no old_block for
%s", chain) or logging.warning("verify_state: anchor partial for %s: %s", chain,
e)), and ensure exceptions use logging.exception(...) or pass exc_info where
appropriate; then run ruff check --fix to apply the automated G004 fixes and
verify no f-strings remain in logging calls.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: edb4a810-9135-44b6-8d5e-8475aa47a473

📥 Commits

Reviewing files that changed from the base of the PR and between 918eeb3 and c4d8a5f.

📒 Files selected for processing (1)
  • api/support/verify_state.py

Comment on lines +96 to +239
logging.warning(f"verify_state: no old_block for {chain}")
return [_format_verifier_status_line(chain, STATUS_ANCHOR_UNAVAILABLE, ts_ns)]

block_hex = chain_state["old_block"]
addr_hex = PROBE_ADDRESSES[chain]
addr_bytes = bytes.fromhex(addr_hex.removeprefix("0x"))

providers = all_providers_for(chain)
chainstack_url = chainstack_endpoint_for(chain)

if not providers:
logging.warning(f"verify_state: no anchor providers configured for {chain}")
return [_format_verifier_status_line(chain, STATUS_ANCHOR_UNAVAILABLE, ts_ns)]
if not chainstack_url:
logging.warning(f"verify_state: no Chainstack endpoint configured for {chain}")
return [_format_verifier_status_line(chain, STATUS_PROOF_UNAVAILABLE, ts_ns)]

# 1. Anchor — multi-provider stateRoot quorum.
try:
anchor = await fetch_agreed_anchor(session, block_hex, providers)
except AnchorDisagreementError:
logging.warning(f"verify_state: stateRoot disagreement for {chain}")
return [
_format_verifier_status_line(chain, STATUS_STATEROOT_DISAGREEMENT, ts_ns)
]
except AnchorPartialResponseError as e:
logging.warning(f"verify_state: anchor partial for {chain}: {e}")
return [_format_verifier_status_line(chain, STATUS_ANCHOR_UNAVAILABLE, ts_ns)]

# 2. Proof from Chainstack.
try:
proof = await fetch_account_proof(session, chainstack_url, addr_hex, block_hex)
except Exception as e:
logging.warning(f"verify_state: proof fetch failed for {chain}: {e}")
return [_format_verifier_status_line(chain, STATUS_PROOF_UNAVAILABLE, ts_ns)]

# 3. Local MPT verification.
try:
balance = verify_account_proof(addr_bytes, proof, anchor)
except ProofError as e:
logging.error(f"verify_state: proof math invalid for {chain}: {e}")
return [_format_verifier_status_line(chain, STATUS_PROOF_MATH_INVALID, ts_ns)]

if balance is None:
# Probe address has no state at OLD_BLOCK — unexpected for our funded probes.
logging.error(
f"verify_state: exclusion proof for {chain} "
f"address={addr_hex} block={block_hex}"
)
return [_format_verifier_status_line(chain, STATUS_PROOF_MATH_INVALID, ts_ns)]

# 4. Emit balance_verified + verifier_status=0.
return [
_format_balance_verified_line(
chain, block_hex, hash_balance_to_float(balance), ts_ns
),
_format_verifier_status_line(chain, STATUS_OK, ts_ns),
]


async def _gather_state_data() -> dict[str, dict[str, Any]]:
"""Read OLD_BLOCK + other state for each in-scope chain from blob storage."""
out: dict[str, dict[str, Any]] = {}
for chain in PROBE_ADDRESSES:
try:
chain_state = await BlockchainState.get_data(chain.lower())
if chain_state:
out[chain.lower()] = chain_state
except Exception:
logging.exception(f"verify_state: failed to read state for {chain}")
return out


async def _verify_all() -> str:
"""Run the verifier across all in-scope chains. Returns Influx text.

If a per-chain task raises an unexpected exception (i.e. something not
handled inside ``_verify_chain``), emit ``verifier_status=2`` for that
chain so the failure is visible in dashboards rather than silently dropped.
"""
state_data = await _gather_state_data()
chains = list(PROBE_ADDRESSES)

async with aiohttp.ClientSession() as session:
results = await asyncio.gather(
*[_verify_chain(session, chain, state_data) for chain in chains],
return_exceptions=True,
)

lines: list[str] = []
fallback_ts_ns = time.time_ns()
for chain, r in zip(chains, results):
if isinstance(r, list):
lines.extend(r)
else:
logging.error(
f"verify_state: chain task raised for {chain}",
exc_info=r if isinstance(r, BaseException) else None,
)
lines.append(
_format_verifier_status_line(
chain, STATUS_PROOF_MATH_INVALID, fallback_ts_ns
)
)
return "\n".join(lines)


async def _push_to_grafana(metrics_text: str) -> None:
"""Push metrics in Influx line protocol to Grafana Cloud.

Mirrors ``MetricsHandler.push_to_grafana`` so this verifier function is
self-contained — refactoring the push helper into a shared util is a separate
concern.
"""
if not metrics_text:
return

url = os.environ.get("GRAFANA_URL")
user = os.environ.get("GRAFANA_USER")
api_key = os.environ.get("GRAFANA_API_KEY")
if not url or not user or not api_key:
logging.warning("verify_state: Grafana env not set, skipping push")
return

timeout = aiohttp.ClientTimeout(total=MetricsServiceConfig.GRAFANA_PUSH_TIMEOUT)
for attempt in range(1, MetricsServiceConfig.GRAFANA_PUSH_MAX_RETRIES + 1):
try:
async with aiohttp.ClientSession() as session:
async with session.post(
url,
headers={"Content-Type": "text/plain"},
data=metrics_text,
auth=aiohttp.BasicAuth(user, api_key),
timeout=timeout,
) as response:
if response.status in (200, 204):
return
logging.warning(
f"verify_state: Grafana push got {response.status} "
f"on attempt {attempt}"
)
except Exception:
logging.exception(
f"verify_state: Grafana push exception on attempt {attempt}"
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot May 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Pervasive f-string logging violates Ruff G004 — run ruff check --fix.

Lines 96, 107, 110, 117, 122, 129, 142–143, 165, 192, 234–235, and 239 all use f-strings inside logging calls. Ruff G004 flags these because f-strings are eagerly evaluated even when the log level is suppressed, wasting CPU. ruff check --fix will rewrite them automatically. Representative example:

-        logging.warning(f"verify_state: anchor partial for {chain}: {e}")
+        logging.warning("verify_state: anchor partial for %s: %s", chain, e)

As per coding guidelines, ruff check and ruff check --fix must be run before completing any task.

🧰 Tools
🪛 Ruff (0.15.12)

[warning] 96-96: Logging statement uses f-string

(G004)


[warning] 107-107: Logging statement uses f-string

(G004)


[warning] 110-110: Logging statement uses f-string

(G004)


[warning] 117-117: Logging statement uses f-string

(G004)


[warning] 122-122: Logging statement uses f-string

(G004)


[warning] 128-128: Do not catch blind exception: Exception

(BLE001)


[warning] 129-129: Logging statement uses f-string

(G004)


[warning] 136-136: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


[warning] 136-136: Logging statement uses f-string

(G004)


[warning] 142-143: Logging statement uses f-string

(G004)


[warning] 164-165: try-except within a loop incurs performance overhead

(PERF203)


[warning] 165-165: Logging statement uses f-string

(G004)


[warning] 192-192: Logging statement uses f-string

(G004)


[warning] 223-230: Use a single with statement with multiple contexts instead of nested with statements

Combine with statements

(SIM117)


[warning] 234-235: Logging statement uses f-string

(G004)


[warning] 239-239: Logging statement uses f-string

(G004)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/support/verify_state.py` around lines 96 - 239, Replace all f-string
usage inside logging calls in this module (e.g. inside _verify_chain,
_gather_state_data, _verify_all, _push_to_grafana and any verify_state call
sites) with logging's lazy formatting or param-style logging (e.g.
logging.warning("verify_state: no old_block for %s", chain) or
logging.warning("verify_state: anchor partial for %s: %s", chain, e)), and
ensure exceptions use logging.exception(...) or pass exc_info where appropriate;
then run ruff check --fix to apply the automated G004 fixes and verify no
f-strings remain in logging calls.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skipping: G is not in the project's ruff selection and the codebase consistently uses f-string logging across update_state.py, metrics_handler.py, and base_metric.py, so following local convention reads cleaner than introducing %s formatting in this one new file.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smypmsa, understood — that's a valid reason. Keeping f-string logging consistent with the rest of the codebase is the right call, and if G isn't selected in the project's Ruff config there's nothing to enforce here. I'll note this for future reviews.


✏️ Learnings added
Learnt from: smypmsa
Repo: chainstacklabs/compare-dashboard-functions PR: 76
File: api/support/verify_state.py:96-250
Timestamp: 2026-05-03T14:06:51.328Z
Learning: In the chainstacklabs/compare-dashboard-functions repo, the Ruff `G` rule group (G004 f-string logging) is NOT enabled in the project's ruff selection. The codebase consistently uses f-string logging across update_state.py, metrics_handler.py, and base_metric.py. Do not flag f-string usage inside logging calls (G004) in this repo.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: smypmsa
Repo: chainstacklabs/compare-dashboard-functions PR: 59
File: dashboards/grafana_sync.py:27-27
Timestamp: 2026-04-15T08:56:43.269Z
Learning: When reviewing Python changes in this repo, avoid asking for broad docstring expansions (e.g., adding multi-section "Args/Returns/Raises" blocks) in files that are unrelated to the PR’s feature scope. Keep one-line docstrings that already satisfy pydocstyle with the "google" convention unchanged. If wider docstring-section coverage is desired as a baseline, defer it to a dedicated docstring-sweep PR rather than expanding docstrings opportunistically during feature PR review.

Comment thread api/support/verify_state.py Outdated
Comment thread api/support/verify_state.py Outdated
Comment thread api/support/verify_state.py
Comment thread api/support/verify_state.py
- Use logging.exception (not logging.error) for ProofError so the
  traceback is preserved.
- Exclusion-proof case (balance is None) now maps to STATUS_PROOF_UNAVAILABLE
  instead of STATUS_PROOF_MATH_INVALID. The proof is cryptographically valid;
  it just yields no balance to verify against (signals wrong probe address,
  stale OLD_BLOCK, or chain reorg deeper than our offset).
- Unexpected per-chain exceptions in _verify_all also map to
  STATUS_PROOF_UNAVAILABLE rather than STATUS_PROOF_MATH_INVALID. Network
  blips and bugs are infra failures, not cryptographic invalidity.
- _push_to_grafana silently ignores 401/403/404/429 per project convention
  (CLAUDE.md: "HTTP 401/403/404/429 errors are silently ignored — plan
  restrictions / rate limits"). Uses the existing
  MetricsServiceConfig.IGNORED_HTTP_ERRORS list.

Not changed: f-string logging style flagged by Ruff G004. The G rule set is
not enabled in the project's ruff config; existing modules (update_state.py,
metrics_handler.py, base_metric.py) all use f-string logging. Following
codebase convention beats following an unselected rule for this one new file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant