feat: emit balance_verified via multi-provider stateRoot quorum + MPT proof#76
feat: emit balance_verified via multi-provider stateRoot quorum + MPT proof#76
Conversation
… proof
New Vercel cron (fra1 only, every 15 min, offset 5 min from update_state)
that produces a second, independent answer to eth_getBalance for each
EVM chain we monitor.
Per round per chain (Ethereum, Base, Arbitrum, BNB):
1. Read OLD_BLOCK from shared blob storage (set by update_state).
2. Multi-provider stateRoot quorum at OLD_BLOCK — every fra1 RPC
provider must respond AND return the same stateRoot, otherwise
emit verifier_status=1 (disagreement) or =3 (partial response).
3. Fetch eth_getProof from Chainstack for the chain's probe address.
4. Walk the MPT proof locally against the agreed stateRoot
(rlp + pycryptodome only). Path-mismatch / empty-branch returns
None for canonical exclusion; hash-mismatch raises ProofError.
5. On full success: emit balance_verified (hashed to 52-bit float
using v1's common/balance_hash.py) + verifier_status=0.
The MPT walker was implemented per the spec algorithm and validated
end-to-end against four chains (800 proofs walked, 0 unexpected
failures, all balances match eth_getBalance) plus tamper resistance
(bit flip, wrong root, truncation all rejected). See
spec-verified-correctness-v2.md (local design doc) for the full
algorithm and empirical-validation table.
Trust narrative:
- Provider being measured: not trusted (audit target)
- Provider serving the proof (Chainstack): not trusted; MPT math
catches forged proofs regardless of who served them
- Anchor stateRoot: trusted only when ALL fra1 providers agreed
- Validator-level corruption: out of scope for v2 (v3's beacon-
anchor closes that gap)
Excluded from v2:
- Hyperliquid: no historical block parameter on eth_getBalance
- Monad: RPC implementation does not expose eth_getProof
(-32601 Method not found, empirically confirmed)
- Solana / TON: different state model, separate spec
Files:
- api/support/verify_state.py: HTTP handler + orchestration
- common/verify/proof.py: MPT walker (~150 LoC, hand-checked)
- common/verify/anchor.py: multi-provider quorum + proof fetch
- common/verify/providers.py: endpoint enumeration
- common/verify/__init__.py: package exports
- vercel.json + vercel.fra1.json: function entry + cron schedule
(only fra1 deploys this)
- pyproject.toml + requirements.txt: rlp + pycryptodome deps
(+27 MB bundle delta; well under Vercel Python's 500 MB limit)
NOT in this PR (intentional split):
- Dashboard panel updates. The Data agreement panels keep using
their existing modal-based query; the new balance_verified
series flows alongside, queryable but not yet displayed.
Panel swap is a separate follow-up PR after observation soak.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdds a Vercel cron HTTP handler and verification library that fetches per-chain block roots from multiple RPC providers, obtains Chainstack account proofs, locally verifies MPT account proofs to extract balances, emits Influx line-protocol metrics, and optionally pushes them to Grafana Cloud. ChangesBlockchain State Verification System
sequenceDiagram
participant Cron as Vercel Cron
participant Handler as verify_state.handler
participant BlockState as BlockchainState
participant RPC as RPC Providers
participant Chainstack as Chainstack Endpoint
participant Verifier as Local MPT Verifier
participant Grafana as Grafana Cloud
Cron->>Handler: GET /api/support/verify_state (Bearer token)
Handler->>Handler: Authenticate & region check
Handler->>BlockState: get_data(chain) for each probe
BlockState-->>Handler: old_block per chain
Handler->>RPC: fetch_agreed_anchor(block) (concurrent)
RPC-->>Handler: agreed stateRoot or error
Handler->>Chainstack: fetch_account_proof(address, block)
Chainstack-->>Handler: accountProof (MPT nodes)
Handler->>Verifier: verify_account_proof(address, accountProof, stateRoot)
Verifier-->>Handler: balance or ProofError
Handler->>Handler: format Influx lines (balance_verified, verifier_status)
Handler->>Grafana: POST metrics text (optional)
Grafana-->>Handler: 200/204
Handler-->>Cron: 200 + emitted metrics text
🎯 4 (Complex) | ⏱️ ~60 minutes
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 42 minutes and 39 seconds.Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@api/support/verify_state.py`:
- Around line 100-102: The current verify_state branch conflates missing anchor
providers and missing proof endpoint by returning STATUS_ANCHOR_UNAVAILABLE for
both; change the logic so if providers is empty (or falsy) you return
_format_verifier_status_line(chain, STATUS_ANCHOR_UNAVAILABLE, ts_ns), but if
providers exist and chainstack_url (from chainstack_endpoint_for()) is falsy you
instead return _format_verifier_status_line(chain, STATUS_PROOF_UNAVAILABLE,
ts_ns); update the conditional in verify_state to check providers first and then
chainstack_url so the correct STATUS_* constant is used.
- Around line 171-176: In the loop over results in verify_state.py (the block
that iterates "for r in results:" and currently only logs exceptions), append a
failure-status line into the "lines" list whenever a chain task raised instead
of dropping the sample; include identifying info (chain id/name or index), a
clear failure marker like "FAIL" or "ERROR", and the exception
message/timestamp, then return the joined lines as before; alternatively
implement the same mapping inside _verify_chain() so each task returns either
its normal list of lines or a single-line failure entry to preserve
verifier_status continuity.
- Around line 87-90: The code currently returns [] when chain_state is missing
or has no "old_block", which drops the chain from the round; instead, call the
existing verifier status emission routine to record a failure for this chain
before returning. In the verify_state block where you check chain_state =
state_data.get(chain_lower) and if not chain_state or not
chain_state.get("old_block"), invoke the project's verifier_status emitter (the
same function used elsewhere to report per-chain results) with the chain
identifier, round/epoch context (e.g., round_id or task id available in scope),
and a clear failure reason like "missing_old_block" or the underlying blob-read
error from _gather_state_data(), then return [] as before. Ensure the emitted
status includes enough context (chain, round, and reason) so dashboards can
distinguish "verifier failed" from "no sample."
In `@common/verify/anchor.py`:
- Around line 106-112: The code decodes block.get("stateRoot") into bytes but
doesn't reject values of the wrong byte length, allowing 31- or 33-byte values
to pass quorum and later fail as STATUS_PROOF_MATH_INVALID; update the logic
around state_root_hex and bytes.fromhex to validate size and reject non-32-byte
results by returning None. Specifically, in the block handling code that reads
state_root_hex (the block.get("stateRoot") path and the bytes.fromhex(...)
call), normalize the hex with removeprefix("0x"), ensure it's an even-length hex
string, decode it, and then check that len(state_root_bytes) == 32; if any check
fails (invalid hex or wrong length) return None so incorrect lengths are treated
as anchor-unavailable.
- Around line 157-167: The code currently filters non-string items out of
account_proof and decodes remaining items, which silently drops malformed
elements; update the logic in the block that handles account_proof (the variable
and the list comprehension using bytes.fromhex and .removeprefix) to first
validate that account_proof is a list of strings and that every element starts
with "0x" and contains valid hex characters, and if any element fails these
checks raise a RuntimeError describing the malformed RPC response instead of
silently skipping entries; perform the validation loop before attempting
bytes.fromhex so errors surface as RPC malformation (include the offending
element in the error message).
In `@common/verify/proof.py`:
- Around line 73-95: The _follow_reference helper currently rejects any
non-empty, non-32-byte bytes refs, but valid MPT proofs may inline child nodes
as raw RLP (<32 bytes); update _follow_reference to, when ref is bytes and
length != 32, treat it as an embedded RLP node and return rlp.decode(ref)
(keeping the existing behavior of raising ProofError when a 32-byte hash is
missing from proof_by_hash and still raising on completely unexpected types);
reference symbols: _follow_reference, proof_by_hash, rlp.decode, ProofError.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 6941e242-c09c-4d44-802a-f40b2e3a154b
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (9)
api/support/verify_state.pycommon/verify/__init__.pycommon/verify/anchor.pycommon/verify/proof.pycommon/verify/providers.pypyproject.tomlrequirements.txtvercel.fra1.jsonvercel.json
| def _follow_reference(ref: Any, proof_by_hash: dict[bytes, bytes]) -> Any: | ||
| """Resolve a child reference to the next decoded node. | ||
|
|
||
| A child reference is either: | ||
| - A list (already-decoded RLP): an inlined node — used directly. The parent's | ||
| hash already authenticated it, so no separate hash check is required. | ||
| - 32 raw bytes: a hash reference — looked up in ``proof_by_hash``. | ||
|
|
||
| Empty refs (``b""``) signal exclusion at the caller and must not be passed | ||
| here. | ||
|
|
||
| Raises: | ||
| ProofError: If the reference is malformed or its hash isn't in the proof. | ||
| """ | ||
| if isinstance(ref, list): | ||
| return ref | ||
| if isinstance(ref, bytes): | ||
| if len(ref) == 32: | ||
| if ref not in proof_by_hash: | ||
| raise ProofError(f"hash {ref.hex()} not present in accountProof") | ||
| return rlp.decode(proof_by_hash[ref]) | ||
| raise ProofError(f"reference length {len(ref)} is invalid (must be 0 or 32)") | ||
| raise ProofError(f"unexpected reference type: {type(ref).__name__}") |
There was a problem hiding this comment.
Decode embedded child nodes instead of rejecting them.
MPT child references are not limited to b"" and 32-byte hashes. When the child node RLP is < 32 bytes, the parent stores that raw RLP inline. This helper currently raises on every non-empty, non-32-byte bytes ref, so valid proofs will fail as soon as traversal hits an embedded node.
Suggested fix
def _follow_reference(ref: Any, proof_by_hash: dict[bytes, bytes]) -> Any:
"""Resolve a child reference to the next decoded node.
@@
if isinstance(ref, list):
return ref
if isinstance(ref, bytes):
if len(ref) == 32:
if ref not in proof_by_hash:
raise ProofError(f"hash {ref.hex()} not present in accountProof")
return rlp.decode(proof_by_hash[ref])
- raise ProofError(f"reference length {len(ref)} is invalid (must be 0 or 32)")
+ if len(ref) == 0:
+ raise ProofError("empty child reference must be handled by the caller")
+ try:
+ return rlp.decode(ref)
+ except Exception as exc:
+ raise ProofError("embedded child reference is not valid RLP") from exc
raise ProofError(f"unexpected reference type: {type(ref).__name__}")🧰 Tools
🪛 Ruff (0.15.12)
[warning] 73-73: Dynamically typed expressions (typing.Any) are disallowed in ref
(ANN401)
[warning] 73-73: Dynamically typed expressions (typing.Any) are disallowed in _follow_reference
(ANN401)
[warning] 92-92: Avoid specifying long messages outside the exception class
(TRY003)
[warning] 94-94: Avoid specifying long messages outside the exception class
(TRY003)
[warning] 95-95: Avoid specifying long messages outside the exception class
(TRY003)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@common/verify/proof.py` around lines 73 - 95, The _follow_reference helper
currently rejects any non-empty, non-32-byte bytes refs, but valid MPT proofs
may inline child nodes as raw RLP (<32 bytes); update _follow_reference to, when
ref is bytes and length != 32, treat it as an embedded RLP node and return
rlp.decode(ref) (keeping the existing behavior of raising ProofError when a
32-byte hash is missing from proof_by_hash and still raising on completely
unexpected types); reference symbols: _follow_reference, proof_by_hash,
rlp.decode, ProofError.
There was a problem hiding this comment.
Not a bug: rlp.decode is recursive so inlined children come back as Python lists and are handled by the isinstance(ref, list) branch above; verified by encoding and decoding a parent with a 22-byte inlined child.
There was a problem hiding this comment.
@smypmsa, you're absolutely right — thank you for the clear explanation. Since rlp.decode is recursive, any inlined child node (RLP-encoded bytes < 32 bytes embedded in the parent) is already expanded into a nested Python list at the point the parent is decoded. By the time _follow_reference is called, those children arrive as list objects and are handled by the isinstance(ref, list) branch. The bytes path in _follow_reference is only ever reached by a genuine 32-byte hash reference. The current code is correct.
I'll record this to avoid raising the same false positive in future reviews.
( ˘ ³˘)♡
✏️ Learnings added
Learnt from: smypmsa
Repo: chainstacklabs/compare-dashboard-functions PR: 76
File: common/verify/proof.py:73-95
Timestamp: 2026-05-03T13:50:08.443Z
Learning: In common/verify/proof.py, the `_follow_reference` function correctly handles only two cases for `bytes` refs: empty bytes (exclusion, handled by caller) and 32-byte hashes (looked up in proof_by_hash). Inlined MPT child nodes (raw RLP < 32 bytes) never appear as raw `bytes` in `_follow_reference` because `rlp.decode` is recursive — it already expands embedded children into nested Python `list` objects when the parent node is decoded. So the `isinstance(ref, list)` branch above handles them. Do NOT suggest adding `rlp.decode(ref)` for the < 32 byte case; it is not needed and was empirically verified with a 22-byte inlined child.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: smypmsa
Repo: chainstacklabs/compare-dashboard-functions PR: 59
File: dashboards/grafana_sync.py:27-27
Timestamp: 2026-04-15T08:56:43.269Z
Learning: When reviewing Python changes in this repo, avoid asking for broad docstring expansions (e.g., adding multi-section "Args/Returns/Raises" blocks) in files that are unrelated to the PR’s feature scope. Keep one-line docstrings that already satisfy pydocstyle with the "google" convention unchanged. If wider docstring-section coverage is desired as a baseline, defer it to a dedicated docstring-sweep PR rather than expanding docstrings opportunistically during feature PR review.
Five accepted fixes; one comment dismissed as a false alarm (verified empirically). verify_state.py: - Missing OLD_BLOCK now emits verifier_status=3 (anchor unavailable) instead of returning [], so a blob/state failure is visible in dashboards rather than indistinguishable from a missing cron run. - Missing endpoints split: no quorum providers -> verifier_status=3; no Chainstack proof endpoint -> verifier_status=4. Previously both mapped to status=3. - Per-chain task that raises an unexpected exception in asyncio.gather now emits verifier_status=2 for that chain (instead of being silently logged and dropped). Prevents silent gaps in the verifier_status series. anchor.py: - _fetch_state_root now rejects non-32-byte stateRoot values; a malformed but unanimous response fails as anchor-unavailable instead of surfacing later as proof-math-invalid. - fetch_account_proof now fails fast on accountProof elements that are not strings; previously the comprehension silently dropped them, which could have been misreported downstream as proof corruption. Dismissed: the suggestion to "decode embedded child nodes" in proof.py's _follow_reference was a false alarm. The Ethereum rlp library decodes parents recursively, so inlined children appear as Python lists (not as sub-32-byte bytes); the walker handles them via isinstance(ref, list). Verified empirically by encoding+decoding a parent with a 22-byte inlined child — decode returns a list, not bytes. Bytes refs of length 1-31 would be malformed per the MPT spec and correctly raise. Smoke test re-run post-fix: walker still returns the correct balance end-to-end on Ethereum public RPC. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@api/support/verify_state.py`:
- Around line 283-287: The handler currently writes raw exception text back to
clients (wfile.write(str(e).encode())) which may leak internals; instead log the
exception server-side (e.g., use self.log_error or the module logger with the
exception info using the exception variable e) and send a generic 500 body like
"Internal server error" (keep using self.send_response, self.send_header,
self.end_headers, and self.wfile.write but replace the payload), removing any
direct exposure of str(e).
- Around line 248-253: The _check_auth method currently treats an empty
CRON_SECRET as a match (allowing "Authorization: Bearer "), so change logic to
fail closed: after honoring SKIP_AUTH, read CRON_SECRET into a variable (e.g.,
secret = os.getenv("CRON_SECRET", "")) and if that secret is empty return False,
then compare token == f"Bearer {secret}". Ensure you reference _check_auth,
self.headers/Authorization, SKIP_AUTH and CRON_SECRET in the update so
unauthenticated requests are rejected when CRON_SECRET is unset.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 5da777c1-2a56-40e8-a347-0036586bf467
📒 Files selected for processing (2)
api/support/verify_state.pycommon/verify/anchor.py
verify_state.py: - _check_auth fails closed when CRON_SECRET is unset; previously an empty Authorization header (Bearer with empty value) would have authenticated if the env var was missing. - 500 responses no longer leak raw exception text; log via logging.exception and return a generic body. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@api/support/verify_state.py`:
- Around line 135-137: In verify_state.py, inside the except ProofError as e
block, replace the logging.error call with logging.exception so the ProofError
traceback is preserved (addresses Ruff TRY400); keep the same message text
("verify_state: proof math invalid for {chain}: {e}") and the subsequent return
of _format_verifier_status_line(chain, STATUS_PROOF_MATH_INVALID, ts_ns)
unchanged.
- Around line 195-198: The fallback that appends a verifier status line
currently uses STATUS_PROOF_MATH_INVALID but should use STATUS_PROOF_UNAVAILABLE
for unexpected exceptions; update the call to _format_verifier_status_line (the
lines.append(...) invocation) to replace STATUS_PROOF_MATH_INVALID with
STATUS_PROOF_UNAVAILABLE so that exceptions from _verify_chain are reported as
"unavailable" rather than cryptographic proof failures.
- Around line 231-236: The _push_to_grafana function currently treats
401/403/404/429 as retriable and emits warnings; update the response handling so
that when response.status is in (401, 403, 404, 429) the function silently
ignores the error (return immediately) without consuming retries or calling
logging.warning, while preserving existing behavior for other statuses (e.g.,
keeping the existing success check for 200/204 and the warning+retry for other
non-success codes); locate this logic around the response/status check in
_push_to_grafana (references: response, attempt, logging.warning) and add an
early return for those four status codes.
- Around line 139-145: The code maps an exclusion proof (when
verify_account_proof returns balance is None) to STATUS_PROOF_MATH_INVALID,
which mislabels a valid exclusion as a malformed proof; in the balance is None
branch inside verify_state.py replace STATUS_PROOF_MATH_INVALID with a more
accurate status — either use the existing STATUS_PROOF_UNAVAILABLE constant or
introduce a new STATUS_EXCLUSION_UNEXPECTED constant (e.g., value 5) and use
that; if you add a new constant, declare it alongside the other STATUS_*
constants and update any consumers or documentation that enumerate statuses;
keep the same return call to _format_verifier_status_line(chain, NEW_STATUS,
ts_ns).
- Around line 96-239: Replace all f-string usage inside logging calls in this
module (e.g. inside _verify_chain, _gather_state_data, _verify_all,
_push_to_grafana and any verify_state call sites) with logging's lazy formatting
or param-style logging (e.g. logging.warning("verify_state: no old_block for
%s", chain) or logging.warning("verify_state: anchor partial for %s: %s", chain,
e)), and ensure exceptions use logging.exception(...) or pass exc_info where
appropriate; then run ruff check --fix to apply the automated G004 fixes and
verify no f-strings remain in logging calls.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: edb4a810-9135-44b6-8d5e-8475aa47a473
📒 Files selected for processing (1)
api/support/verify_state.py
| logging.warning(f"verify_state: no old_block for {chain}") | ||
| return [_format_verifier_status_line(chain, STATUS_ANCHOR_UNAVAILABLE, ts_ns)] | ||
|
|
||
| block_hex = chain_state["old_block"] | ||
| addr_hex = PROBE_ADDRESSES[chain] | ||
| addr_bytes = bytes.fromhex(addr_hex.removeprefix("0x")) | ||
|
|
||
| providers = all_providers_for(chain) | ||
| chainstack_url = chainstack_endpoint_for(chain) | ||
|
|
||
| if not providers: | ||
| logging.warning(f"verify_state: no anchor providers configured for {chain}") | ||
| return [_format_verifier_status_line(chain, STATUS_ANCHOR_UNAVAILABLE, ts_ns)] | ||
| if not chainstack_url: | ||
| logging.warning(f"verify_state: no Chainstack endpoint configured for {chain}") | ||
| return [_format_verifier_status_line(chain, STATUS_PROOF_UNAVAILABLE, ts_ns)] | ||
|
|
||
| # 1. Anchor — multi-provider stateRoot quorum. | ||
| try: | ||
| anchor = await fetch_agreed_anchor(session, block_hex, providers) | ||
| except AnchorDisagreementError: | ||
| logging.warning(f"verify_state: stateRoot disagreement for {chain}") | ||
| return [ | ||
| _format_verifier_status_line(chain, STATUS_STATEROOT_DISAGREEMENT, ts_ns) | ||
| ] | ||
| except AnchorPartialResponseError as e: | ||
| logging.warning(f"verify_state: anchor partial for {chain}: {e}") | ||
| return [_format_verifier_status_line(chain, STATUS_ANCHOR_UNAVAILABLE, ts_ns)] | ||
|
|
||
| # 2. Proof from Chainstack. | ||
| try: | ||
| proof = await fetch_account_proof(session, chainstack_url, addr_hex, block_hex) | ||
| except Exception as e: | ||
| logging.warning(f"verify_state: proof fetch failed for {chain}: {e}") | ||
| return [_format_verifier_status_line(chain, STATUS_PROOF_UNAVAILABLE, ts_ns)] | ||
|
|
||
| # 3. Local MPT verification. | ||
| try: | ||
| balance = verify_account_proof(addr_bytes, proof, anchor) | ||
| except ProofError as e: | ||
| logging.error(f"verify_state: proof math invalid for {chain}: {e}") | ||
| return [_format_verifier_status_line(chain, STATUS_PROOF_MATH_INVALID, ts_ns)] | ||
|
|
||
| if balance is None: | ||
| # Probe address has no state at OLD_BLOCK — unexpected for our funded probes. | ||
| logging.error( | ||
| f"verify_state: exclusion proof for {chain} " | ||
| f"address={addr_hex} block={block_hex}" | ||
| ) | ||
| return [_format_verifier_status_line(chain, STATUS_PROOF_MATH_INVALID, ts_ns)] | ||
|
|
||
| # 4. Emit balance_verified + verifier_status=0. | ||
| return [ | ||
| _format_balance_verified_line( | ||
| chain, block_hex, hash_balance_to_float(balance), ts_ns | ||
| ), | ||
| _format_verifier_status_line(chain, STATUS_OK, ts_ns), | ||
| ] | ||
|
|
||
|
|
||
| async def _gather_state_data() -> dict[str, dict[str, Any]]: | ||
| """Read OLD_BLOCK + other state for each in-scope chain from blob storage.""" | ||
| out: dict[str, dict[str, Any]] = {} | ||
| for chain in PROBE_ADDRESSES: | ||
| try: | ||
| chain_state = await BlockchainState.get_data(chain.lower()) | ||
| if chain_state: | ||
| out[chain.lower()] = chain_state | ||
| except Exception: | ||
| logging.exception(f"verify_state: failed to read state for {chain}") | ||
| return out | ||
|
|
||
|
|
||
| async def _verify_all() -> str: | ||
| """Run the verifier across all in-scope chains. Returns Influx text. | ||
|
|
||
| If a per-chain task raises an unexpected exception (i.e. something not | ||
| handled inside ``_verify_chain``), emit ``verifier_status=2`` for that | ||
| chain so the failure is visible in dashboards rather than silently dropped. | ||
| """ | ||
| state_data = await _gather_state_data() | ||
| chains = list(PROBE_ADDRESSES) | ||
|
|
||
| async with aiohttp.ClientSession() as session: | ||
| results = await asyncio.gather( | ||
| *[_verify_chain(session, chain, state_data) for chain in chains], | ||
| return_exceptions=True, | ||
| ) | ||
|
|
||
| lines: list[str] = [] | ||
| fallback_ts_ns = time.time_ns() | ||
| for chain, r in zip(chains, results): | ||
| if isinstance(r, list): | ||
| lines.extend(r) | ||
| else: | ||
| logging.error( | ||
| f"verify_state: chain task raised for {chain}", | ||
| exc_info=r if isinstance(r, BaseException) else None, | ||
| ) | ||
| lines.append( | ||
| _format_verifier_status_line( | ||
| chain, STATUS_PROOF_MATH_INVALID, fallback_ts_ns | ||
| ) | ||
| ) | ||
| return "\n".join(lines) | ||
|
|
||
|
|
||
| async def _push_to_grafana(metrics_text: str) -> None: | ||
| """Push metrics in Influx line protocol to Grafana Cloud. | ||
|
|
||
| Mirrors ``MetricsHandler.push_to_grafana`` so this verifier function is | ||
| self-contained — refactoring the push helper into a shared util is a separate | ||
| concern. | ||
| """ | ||
| if not metrics_text: | ||
| return | ||
|
|
||
| url = os.environ.get("GRAFANA_URL") | ||
| user = os.environ.get("GRAFANA_USER") | ||
| api_key = os.environ.get("GRAFANA_API_KEY") | ||
| if not url or not user or not api_key: | ||
| logging.warning("verify_state: Grafana env not set, skipping push") | ||
| return | ||
|
|
||
| timeout = aiohttp.ClientTimeout(total=MetricsServiceConfig.GRAFANA_PUSH_TIMEOUT) | ||
| for attempt in range(1, MetricsServiceConfig.GRAFANA_PUSH_MAX_RETRIES + 1): | ||
| try: | ||
| async with aiohttp.ClientSession() as session: | ||
| async with session.post( | ||
| url, | ||
| headers={"Content-Type": "text/plain"}, | ||
| data=metrics_text, | ||
| auth=aiohttp.BasicAuth(user, api_key), | ||
| timeout=timeout, | ||
| ) as response: | ||
| if response.status in (200, 204): | ||
| return | ||
| logging.warning( | ||
| f"verify_state: Grafana push got {response.status} " | ||
| f"on attempt {attempt}" | ||
| ) | ||
| except Exception: | ||
| logging.exception( | ||
| f"verify_state: Grafana push exception on attempt {attempt}" |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial | ⚡ Quick win
Pervasive f-string logging violates Ruff G004 — run ruff check --fix.
Lines 96, 107, 110, 117, 122, 129, 142–143, 165, 192, 234–235, and 239 all use f-strings inside logging calls. Ruff G004 flags these because f-strings are eagerly evaluated even when the log level is suppressed, wasting CPU. ruff check --fix will rewrite them automatically. Representative example:
- logging.warning(f"verify_state: anchor partial for {chain}: {e}")
+ logging.warning("verify_state: anchor partial for %s: %s", chain, e)As per coding guidelines, ruff check and ruff check --fix must be run before completing any task.
🧰 Tools
🪛 Ruff (0.15.12)
[warning] 96-96: Logging statement uses f-string
(G004)
[warning] 107-107: Logging statement uses f-string
(G004)
[warning] 110-110: Logging statement uses f-string
(G004)
[warning] 117-117: Logging statement uses f-string
(G004)
[warning] 122-122: Logging statement uses f-string
(G004)
[warning] 128-128: Do not catch blind exception: Exception
(BLE001)
[warning] 129-129: Logging statement uses f-string
(G004)
[warning] 136-136: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
[warning] 136-136: Logging statement uses f-string
(G004)
[warning] 142-143: Logging statement uses f-string
(G004)
[warning] 164-165: try-except within a loop incurs performance overhead
(PERF203)
[warning] 165-165: Logging statement uses f-string
(G004)
[warning] 192-192: Logging statement uses f-string
(G004)
[warning] 223-230: Use a single with statement with multiple contexts instead of nested with statements
Combine with statements
(SIM117)
[warning] 234-235: Logging statement uses f-string
(G004)
[warning] 239-239: Logging statement uses f-string
(G004)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@api/support/verify_state.py` around lines 96 - 239, Replace all f-string
usage inside logging calls in this module (e.g. inside _verify_chain,
_gather_state_data, _verify_all, _push_to_grafana and any verify_state call
sites) with logging's lazy formatting or param-style logging (e.g.
logging.warning("verify_state: no old_block for %s", chain) or
logging.warning("verify_state: anchor partial for %s: %s", chain, e)), and
ensure exceptions use logging.exception(...) or pass exc_info where appropriate;
then run ruff check --fix to apply the automated G004 fixes and verify no
f-strings remain in logging calls.
There was a problem hiding this comment.
Skipping: G is not in the project's ruff selection and the codebase consistently uses f-string logging across update_state.py, metrics_handler.py, and base_metric.py, so following local convention reads cleaner than introducing %s formatting in this one new file.
There was a problem hiding this comment.
@smypmsa, understood — that's a valid reason. Keeping f-string logging consistent with the rest of the codebase is the right call, and if G isn't selected in the project's Ruff config there's nothing to enforce here. I'll note this for future reviews.
✏️ Learnings added
Learnt from: smypmsa
Repo: chainstacklabs/compare-dashboard-functions PR: 76
File: api/support/verify_state.py:96-250
Timestamp: 2026-05-03T14:06:51.328Z
Learning: In the chainstacklabs/compare-dashboard-functions repo, the Ruff `G` rule group (G004 f-string logging) is NOT enabled in the project's ruff selection. The codebase consistently uses f-string logging across update_state.py, metrics_handler.py, and base_metric.py. Do not flag f-string usage inside logging calls (G004) in this repo.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: smypmsa
Repo: chainstacklabs/compare-dashboard-functions PR: 59
File: dashboards/grafana_sync.py:27-27
Timestamp: 2026-04-15T08:56:43.269Z
Learning: When reviewing Python changes in this repo, avoid asking for broad docstring expansions (e.g., adding multi-section "Args/Returns/Raises" blocks) in files that are unrelated to the PR’s feature scope. Keep one-line docstrings that already satisfy pydocstyle with the "google" convention unchanged. If wider docstring-section coverage is desired as a baseline, defer it to a dedicated docstring-sweep PR rather than expanding docstrings opportunistically during feature PR review.
- Use logging.exception (not logging.error) for ProofError so the traceback is preserved. - Exclusion-proof case (balance is None) now maps to STATUS_PROOF_UNAVAILABLE instead of STATUS_PROOF_MATH_INVALID. The proof is cryptographically valid; it just yields no balance to verify against (signals wrong probe address, stale OLD_BLOCK, or chain reorg deeper than our offset). - Unexpected per-chain exceptions in _verify_all also map to STATUS_PROOF_UNAVAILABLE rather than STATUS_PROOF_MATH_INVALID. Network blips and bugs are infra failures, not cryptographic invalidity. - _push_to_grafana silently ignores 401/403/404/429 per project convention (CLAUDE.md: "HTTP 401/403/404/429 errors are silently ignored — plan restrictions / rate limits"). Uses the existing MetricsServiceConfig.IGNORED_HTTP_ERRORS list. Not changed: f-string logging style flagged by Ruff G004. The G rule set is not enabled in the project's ruff config; existing modules (update_state.py, metrics_handler.py, base_metric.py) all use f-string logging. Following codebase convention beats following an unselected rule for this one new file. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
New
verify_stateVercel cron (fra1 only, every 15 min, offset 5 min fromupdate_state) that produces a second, independent answer toeth_getBalancefor each EVM chain.Per round, per chain (Ethereum, Base, Arbitrum, BNB):
verifier_status=1(disagreement) or=3(partial response).eth_getProoffrom Chainstack for the chain's probe address.None(canonical exclusion); hash-mismatch raisesProofError.balance_verified(hashed via v1'scommon/balance_hash.py) +verifier_status=0.Trust narrative
Empirical validation done while writing the spec
eth_getBalancebyte-for-byte on every probe. 0 unexpected failures.verifier_status=2).eth_getBalance(1722166635215000wei).Excluded from v2
eth_getBalance(onlylatest).eth_getProof—-32601 Method not foundconfirmed againstMonad/0.14.2testnet.Their dashboards keep using v1's modal-based comparison.
Files
api/support/verify_state.pycommon/verify/proof.pycommon/verify/anchor.pycommon/verify/providers.pyChainstack-only filter)common/verify/__init__.pyvercel.json+vercel.fra1.jsonpyproject.toml+requirements.txtrlp==4.1.0+pycryptodome==3.21.0Total: ~720 LoC of new Python + small config edits.
NOT in this PR (intentional split)
Dashboard panel updates. The Data agreement panels we shipped in PR #75 keep using their existing modal-based query; the new
balance_verifiedseries flows in alongside, queryable but not yet displayed by panels. Panel swap to usebalance_verifiedlives in a separate follow-up PR after this verifier soaks for ~1 week and we confirmverifier_statusis healthy.Rationale: keep concerns separate. Verifier PR is Vercel-function code + cron + new metric series. Dashboard PR is JSON in
dashboards/synced viagrafana_sync.py. They're independent — verifier emits regardless of whether panels query, and existing modal panels keep working until the swap lands.Test plan
uvx black .cleanuvx ruff check .cleanuvx mypy .238 errors (= master baseline, no new errors from this PR)eth_getBalanceeth_getProofon each Chainstack production endpoint (Ethereum / Base / Arbitrum / BNB). Single curl per chain. Most likely to surface:base-rpc.publicnode.com-style "tight proof window" — production Chainstack archive should not have this.vercel inspect <preview-deployment>. Local measurement says ~27 MB delta; expect under 500 MB Python limit.dev_balance_verifiedseries appears for all 4 chains within 30 min (~2 cron rounds). Confirmdev_verifier_status=0in steady state.Cron + region gating
5-59/15 * * * *→ minutes 5, 20, 35, 50.update_statestays on*/15→ minutes 0, 15, 30, 45.OLD_BLOCKfrom blob.vercel.fra1.json(sin1 / sfo1 / hnd1 don't get it).VERCEL_REGION == "fra1"short-circuits if a non-fra1 deploy somehow registered the cron.Bearer ${CRON_SECRET}matchingupdate_state.py.What v3 will be (separate spec, separate PR)
v3 swaps the anchor source from "multi-RPC-provider stateRoot quorum" to "multi-beacon-API stateRoot quorum" — closes the residual blind spot of "all RPC providers share an upstream wrong stateRoot". Walker, schema, panels, cron, region gate all carry verbatim. ~150 LoC of
anchor.pyrewrite +beacon_providers.py. Spec file:spec-verified-correctness-v3.md.🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Chores