Implement `mock_chain_validation` build tag (compute-and-swallow consensus + module-genesis failures)

## Problem

Production-grade testing of seid at real-world state size — pacific-1's IAVL depth, contract cardinality, account distribution, in-flight gov state — is structurally hard. Existing options fall short:

- State-bloat seeder on harbor (synthesized state) leaves a "your shape isn't really pacific-1" gap that defeats the point.
- `seid export` + state surgery on x/staking + x/distribution + x/slashing + bank balances was scoped at 1–2 weeks with high probability of "boots and halts at block 1–10."

The team needs a build of seid that boots from any chain export, runs all prod code paths with full validation work performed, and converts halting mechanisms (consensus failures, cryptographic state mismatches, module-genesis invariant panics) into log+counter events. Operators see *what* diverged (actual hash values, actual invariant mismatch magnitudes), not just that something did. The chain "continues happily" so unique testing scenarios can run against real prod state with diagnostic signal preserved.

## Impact

Eliminates the "your synthesized shape isn't really pacific-1" objection that has been blocking real-state load testing. Unlocks reproducible chaos scenarios against actual mainnet state for performance evaluation, behavior probing, and pre-release validation — at the cost of one well-isolated build target.

## Relevant experts

- sei-tendermint owners — Layer 1 refactor of `ConsensusPolicy` interface and validation paths (M1)
- sei-cosmos x/staking + x/distribution module owners — Layer 2 panic-site signoff (M3)
- sei-chain x/evm + x/oracle module owners — Layer 3 module audit (M4)
- Build system / CI owners — Makefile + Dockerfile + image-tag work (M5)
- Platform team — first lab smoke test orchestration (M6)

## Proposed approach

Six milestones, deliverable as separate reviewable PRs. **Pattern: per-package `//go:build mock_chain_validation` variant files in every layer** (matches the existing `mock_block_validation` precedent in sei-tendermint; compile-time gate, no API signature changes).

**M1 — sei-tendermint policy injection refactor (no behavior change).**
- M1.0: Audit halting checks in sei-tendermint validation paths. Output: a markdown enumeration of every site where validation can return an error that halts the chain (file:line, error type, what it gates).
- M1.0: Settle the `Swallow*Failure() bool` interface shape. New methods on `ConsensusPolicy` returning `false` by default.
- M1.1: Refactor each audited site from "if policy says skip, return early" to "compute the result; if failure, log detail + counter; conditionally return error based on `policy.Swallow*Failure()`."
- M1.1: Update `consensus_policy_default.go` and `consensus_policy_mock_block_validation.go` to return `false` for new `Swallow*` methods (preserves existing tag semantics).
- M1.1: Tests covering production behavior (unchanged) and a mock policy variant flipping `Swallow*Failure() == true`.

Deliverable: one PR against sei-chain landing the refactor without changing any runtime behavior.

**M2 — sei-tendermint `mock_chain_validation` variant.**
- Add `sei-tendermint/types/consensus_policy_mock_chain_validation.go` with `//go:build mock_chain_validation` returning `true` for every `Swallow*Failure()` method.
- Verify `go build -tags mock_chain_validation` produces a working binary.

Deliverable: one small additive PR.

**M3 — sei-cosmos module-genesis panic guards.**
- `//go:build mock_chain_validation` variants for `sei-cosmos/x/staking/genesis.go` converting lines 113, 126, 139 from `panic(...)` to log+counter+continue. Log payload includes both sides of the failed comparison.
- Same for `sei-cosmos/x/distribution/keeper/genesis.go` (8 sites, lines 28-95).
- Optional `sei-cosmos/x/slashing/genesis.go` if audit reveals panic sites.

Deliverable: one PR against sei-chain (sei-cosmos subtree).

**M4 — sei-chain module audit + variants.**
- 30-min audit of `x/evm/genesis.go`, `x/oracle/genesis.go`, and other sei-specific modules (tokenfactory, epoch, dex if present) for cross-state invariant panics.
- Apply the same variant pattern where needed.

Deliverable: one PR against sei-chain.

**M5 — Build system + image.**
- Makefile target: `make build-unsafe` injecting `GO_BUILD_TAGS=mock_chain_validation` and producing an `unsafe-vX.Y.Z` image.
- Dockerfile / GitHub Actions matrix entry pushing the image to ECR/GHCR with the `unsafe-` prefix.
- CI sanity check: confirm the build tag is in the `version.BuildTags` ldflag.

Deliverable: one PR against sei-chain.

**M6 — First lab smoke test.**
- Use a recent pacific-1 export.
- Boot the binary with new validators.
- Verify the chain survives block 1 (success criterion: `sei_unsafe_validation_skipped_total` is non-zero, chain produces blocks).
- Verify the structured log lines fire with divergence detail.
- Capture results in a follow-up document.

Deliverable: a manifest update in platform-shadow + a brief writeup.

## Acceptance criteria

- [ ] M1 PR merged: `ConsensusPolicy` interface extended; validation paths refactored; production behavior unchanged; tests pass.
- [ ] M2 PR merged: `mock_chain_validation` variant exists; `go build -tags mock_chain_validation` succeeds.
- [ ] M3 PR merged: x/staking + x/distribution panic sites have variants; module owners signed off on which sites convert vs stay as halts.
- [ ] M4 PR merged: x/evm + x/oracle + sei-specific module audit complete; any found panic sites have variants.
- [ ] M5 PR merged: `unsafe-`-prefixed image published from CI.
- [ ] M6 smoke test: a binary built with the tag boots a pacific-1 fork past block 1 with new validators.

## Out of scope (with un-defer triggers)

- Runtime startup guards refusing prod chain-ids. Un-defer trigger: a near-miss where someone tries to deploy the unsafe binary to a prod cluster.
- Separate ECR repository for unsafe builds. Un-defer trigger: same.
- Cosign provenance / signing separation. Un-defer trigger: any compliance/audit requirement.
- Out-of-band AppHash-diff sidecar to detect silent divergence in the lab. Un-defer trigger: lab results start producing numbers that seem too good to be true.
- Combining `mock_chain_validation` with `mock_block_validation` into an umbrella tag. Security-specialist explicitly rejected — separate tags isolate blast radius.
- Parameter-passing approach instead of per-package build tags. Decided in favor of per-package variant files for compile-time gate parity with `mock_block_validation` precedent.

## References

- Design doc: `sei-protocol/platform:docs/designs/mock-chain-validation-build-tag.md` (platform PR #505)
- Coral session synthesizing the design ran with sei-network-specialist, security-specialist, and product-engineer.
- Sibling lineage: pacific-1 Giga shadow replayer work — platform PRs #479, #480, #483.

## Open questions (decisions wait for implementation review)

1. Layer 1 audit scope — which sei-tendermint validation paths need routing through `ConsensusPolicy`
2. `Swallow*Failure()` interface shape — final naming at implementation discretion
3. Which sei-cosmos panic sites convert vs which stay as halts — needs x/staking + x/distribution depth signoff at M3 PR-time review
4. Telemetry granularity (design defaults to one counter with `{site, kind}` labels)
5. Image tag prefix specifics — design proposes `unsafe-`
6. Scope of the cosmos-sdk-upgrade audit going forward

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement `mock_chain_validation` build tag (compute-and-swallow consensus + module-genesis failures) #3427

Problem

Impact

Relevant experts

Proposed approach

Acceptance criteria

Out of scope (with un-defer triggers)

References

Open questions (decisions wait for implementation review)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implement mock_chain_validation build tag (compute-and-swallow consensus + module-genesis failures) #3427

Description

Problem

Impact

Relevant experts

Proposed approach

Acceptance criteria

Out of scope (with un-defer triggers)

References

Open questions (decisions wait for implementation review)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Implement `mock_chain_validation` build tag (compute-and-swallow consensus + module-genesis failures) #3427