Problem
Production-grade testing of seid at real-world state size — pacific-1's IAVL depth, contract cardinality, account distribution, in-flight gov state — is structurally hard. Existing options fall short:
- State-bloat seeder on harbor (synthesized state) leaves a "your shape isn't really pacific-1" gap that defeats the point.
seid export + state surgery on x/staking + x/distribution + x/slashing + bank balances was scoped at 1–2 weeks with high probability of "boots and halts at block 1–10."
The team needs a build of seid that boots from any chain export, runs all prod code paths with full validation work performed, and converts halting mechanisms (consensus failures, cryptographic state mismatches, module-genesis invariant panics) into log+counter events. Operators see what diverged (actual hash values, actual invariant mismatch magnitudes), not just that something did. The chain "continues happily" so unique testing scenarios can run against real prod state with diagnostic signal preserved.
Impact
Eliminates the "your synthesized shape isn't really pacific-1" objection that has been blocking real-state load testing. Unlocks reproducible chaos scenarios against actual mainnet state for performance evaluation, behavior probing, and pre-release validation — at the cost of one well-isolated build target.
Relevant experts
- sei-tendermint owners — Layer 1 refactor of
ConsensusPolicy interface and validation paths (M1)
- sei-cosmos x/staking + x/distribution module owners — Layer 2 panic-site signoff (M3)
- sei-chain x/evm + x/oracle module owners — Layer 3 module audit (M4)
- Build system / CI owners — Makefile + Dockerfile + image-tag work (M5)
- Platform team — first lab smoke test orchestration (M6)
Proposed approach
Six milestones, deliverable as separate reviewable PRs. Pattern: per-package //go:build mock_chain_validation variant files in every layer (matches the existing mock_block_validation precedent in sei-tendermint; compile-time gate, no API signature changes).
M1 — sei-tendermint policy injection refactor (no behavior change).
- M1.0: Audit halting checks in sei-tendermint validation paths. Output: a markdown enumeration of every site where validation can return an error that halts the chain (file:line, error type, what it gates).
- M1.0: Settle the
Swallow*Failure() bool interface shape. New methods on ConsensusPolicy returning false by default.
- M1.1: Refactor each audited site from "if policy says skip, return early" to "compute the result; if failure, log detail + counter; conditionally return error based on
policy.Swallow*Failure()."
- M1.1: Update
consensus_policy_default.go and consensus_policy_mock_block_validation.go to return false for new Swallow* methods (preserves existing tag semantics).
- M1.1: Tests covering production behavior (unchanged) and a mock policy variant flipping
Swallow*Failure() == true.
Deliverable: one PR against sei-chain landing the refactor without changing any runtime behavior.
M2 — sei-tendermint mock_chain_validation variant.
- Add
sei-tendermint/types/consensus_policy_mock_chain_validation.go with //go:build mock_chain_validation returning true for every Swallow*Failure() method.
- Verify
go build -tags mock_chain_validation produces a working binary.
Deliverable: one small additive PR.
M3 — sei-cosmos module-genesis panic guards.
//go:build mock_chain_validation variants for sei-cosmos/x/staking/genesis.go converting lines 113, 126, 139 from panic(...) to log+counter+continue. Log payload includes both sides of the failed comparison.
- Same for
sei-cosmos/x/distribution/keeper/genesis.go (8 sites, lines 28-95).
- Optional
sei-cosmos/x/slashing/genesis.go if audit reveals panic sites.
Deliverable: one PR against sei-chain (sei-cosmos subtree).
M4 — sei-chain module audit + variants.
- 30-min audit of
x/evm/genesis.go, x/oracle/genesis.go, and other sei-specific modules (tokenfactory, epoch, dex if present) for cross-state invariant panics.
- Apply the same variant pattern where needed.
Deliverable: one PR against sei-chain.
M5 — Build system + image.
- Makefile target:
make build-unsafe injecting GO_BUILD_TAGS=mock_chain_validation and producing an unsafe-vX.Y.Z image.
- Dockerfile / GitHub Actions matrix entry pushing the image to ECR/GHCR with the
unsafe- prefix.
- CI sanity check: confirm the build tag is in the
version.BuildTags ldflag.
Deliverable: one PR against sei-chain.
M6 — First lab smoke test.
- Use a recent pacific-1 export.
- Boot the binary with new validators.
- Verify the chain survives block 1 (success criterion:
sei_unsafe_validation_skipped_total is non-zero, chain produces blocks).
- Verify the structured log lines fire with divergence detail.
- Capture results in a follow-up document.
Deliverable: a manifest update in platform-shadow + a brief writeup.
Acceptance criteria
Out of scope (with un-defer triggers)
- Runtime startup guards refusing prod chain-ids. Un-defer trigger: a near-miss where someone tries to deploy the unsafe binary to a prod cluster.
- Separate ECR repository for unsafe builds. Un-defer trigger: same.
- Cosign provenance / signing separation. Un-defer trigger: any compliance/audit requirement.
- Out-of-band AppHash-diff sidecar to detect silent divergence in the lab. Un-defer trigger: lab results start producing numbers that seem too good to be true.
- Combining
mock_chain_validation with mock_block_validation into an umbrella tag. Security-specialist explicitly rejected — separate tags isolate blast radius.
- Parameter-passing approach instead of per-package build tags. Decided in favor of per-package variant files for compile-time gate parity with
mock_block_validation precedent.
References
Open questions (decisions wait for implementation review)
- Layer 1 audit scope — which sei-tendermint validation paths need routing through
ConsensusPolicy
Swallow*Failure() interface shape — final naming at implementation discretion
- Which sei-cosmos panic sites convert vs which stay as halts — needs x/staking + x/distribution depth signoff at M3 PR-time review
- Telemetry granularity (design defaults to one counter with
{site, kind} labels)
- Image tag prefix specifics — design proposes
unsafe-
- Scope of the cosmos-sdk-upgrade audit going forward
🤖 Generated with Claude Code
Problem
Production-grade testing of seid at real-world state size — pacific-1's IAVL depth, contract cardinality, account distribution, in-flight gov state — is structurally hard. Existing options fall short:
seid export+ state surgery on x/staking + x/distribution + x/slashing + bank balances was scoped at 1–2 weeks with high probability of "boots and halts at block 1–10."The team needs a build of seid that boots from any chain export, runs all prod code paths with full validation work performed, and converts halting mechanisms (consensus failures, cryptographic state mismatches, module-genesis invariant panics) into log+counter events. Operators see what diverged (actual hash values, actual invariant mismatch magnitudes), not just that something did. The chain "continues happily" so unique testing scenarios can run against real prod state with diagnostic signal preserved.
Impact
Eliminates the "your synthesized shape isn't really pacific-1" objection that has been blocking real-state load testing. Unlocks reproducible chaos scenarios against actual mainnet state for performance evaluation, behavior probing, and pre-release validation — at the cost of one well-isolated build target.
Relevant experts
ConsensusPolicyinterface and validation paths (M1)Proposed approach
Six milestones, deliverable as separate reviewable PRs. Pattern: per-package
//go:build mock_chain_validationvariant files in every layer (matches the existingmock_block_validationprecedent in sei-tendermint; compile-time gate, no API signature changes).M1 — sei-tendermint policy injection refactor (no behavior change).
Swallow*Failure() boolinterface shape. New methods onConsensusPolicyreturningfalseby default.policy.Swallow*Failure()."consensus_policy_default.goandconsensus_policy_mock_block_validation.goto returnfalsefor newSwallow*methods (preserves existing tag semantics).Swallow*Failure() == true.Deliverable: one PR against sei-chain landing the refactor without changing any runtime behavior.
M2 — sei-tendermint
mock_chain_validationvariant.sei-tendermint/types/consensus_policy_mock_chain_validation.gowith//go:build mock_chain_validationreturningtruefor everySwallow*Failure()method.go build -tags mock_chain_validationproduces a working binary.Deliverable: one small additive PR.
M3 — sei-cosmos module-genesis panic guards.
//go:build mock_chain_validationvariants forsei-cosmos/x/staking/genesis.goconverting lines 113, 126, 139 frompanic(...)to log+counter+continue. Log payload includes both sides of the failed comparison.sei-cosmos/x/distribution/keeper/genesis.go(8 sites, lines 28-95).sei-cosmos/x/slashing/genesis.goif audit reveals panic sites.Deliverable: one PR against sei-chain (sei-cosmos subtree).
M4 — sei-chain module audit + variants.
x/evm/genesis.go,x/oracle/genesis.go, and other sei-specific modules (tokenfactory, epoch, dex if present) for cross-state invariant panics.Deliverable: one PR against sei-chain.
M5 — Build system + image.
make build-unsafeinjectingGO_BUILD_TAGS=mock_chain_validationand producing anunsafe-vX.Y.Zimage.unsafe-prefix.version.BuildTagsldflag.Deliverable: one PR against sei-chain.
M6 — First lab smoke test.
sei_unsafe_validation_skipped_totalis non-zero, chain produces blocks).Deliverable: a manifest update in platform-shadow + a brief writeup.
Acceptance criteria
ConsensusPolicyinterface extended; validation paths refactored; production behavior unchanged; tests pass.mock_chain_validationvariant exists;go build -tags mock_chain_validationsucceeds.unsafe--prefixed image published from CI.Out of scope (with un-defer triggers)
mock_chain_validationwithmock_block_validationinto an umbrella tag. Security-specialist explicitly rejected — separate tags isolate blast radius.mock_block_validationprecedent.References
sei-protocol/platform:docs/designs/mock-chain-validation-build-tag.md(platform PR [Oracle Price Feeder] Improve Success Rate by managing account sequence locally #505)Open questions (decisions wait for implementation review)
ConsensusPolicySwallow*Failure()interface shape — final naming at implementation discretion{site, kind}labels)unsafe-🤖 Generated with Claude Code