Skip to content

Persist scoring window and voted swaps across validator restarts#17

Open
bitloi wants to merge 7 commits intoentrius:testfrom
bitloi:fix/scoring-window-persistence
Open

Persist scoring window and voted swaps across validator restarts#17
bitloi wants to merge 7 commits intoentrius:testfrom
bitloi:fix/scoring-window-persistence

Conversation

@bitloi
Copy link
Copy Markdown

@bitloi bitloi commented Apr 7, 2026

Closes #16

Summary

  • Add ScoringWindowStore for atomic JSON persistence of the scoring window and voted set, following the existing tmp-file-then-rename pattern used by SwapFulfiller._save_sent_cache.
  • Wire persistence into SwapTracker load on initialize(), persist on swap resolution, window pruning, and mark_voted().
  • Remove duplicate _resolved_block helper by sharing a single resolved_block() from the store module.
  • Create store in validator init with cache path at <neuron.full_path>/scoring_window.json.

Design

Persistence hooks at every state mutation point only. The store is optional (None disables persistence) so existing tests and standalone usage remain unaffected. On cold start, stale voted IDs are pruned against the active swap set to prevent unbounded growth.

Test plan

  • ruff check allways/validator/scoring_store.py allways/validator/swap_tracker.py neurons/validator.py passes
  • python3 -m pytest tests/test_rate.py tests/test_chains.py -q --tb=short passes
  • python3 -m pytest tests/test_scoring_store.py -q --tb=short passes (local)
  • Roundtrip serialization preserves window and voted IDs
  • Corrupt/missing cache file falls back to empty state
  • Stale voted IDs pruned on restart when no active swaps exist
  • Window entries older than SCORING_WINDOW_BLOCKS pruned on load

@bitloi
Copy link
Copy Markdown
Author

bitloi commented Apr 7, 2026

@LandynDev @anderdc Ready for review!

@LandynDev
Copy link
Copy Markdown
Collaborator

Thanks, but this was solved and merged just before this was opened. If there's any necessary tweaks that should be made to the implementation, feel free to open another PR.

Solution that was merged for this: #11

@LandynDev LandynDev closed this Apr 9, 2026
@bitloi
Copy link
Copy Markdown
Author

bitloi commented Apr 9, 2026

Thanks for the clarification. I re-checked the mapping and I think there’s a mismatch:

  • #11 (merged) persists PendingConfirmQueue in SQLite (pending_confirms.py + axon_handlers.py changes).
  • Issue #16 is specifically about restart loss of SwapTracker.window and voted swap IDs, i.e. scoring-state persistence.

So while #11 addresses validator restart durability for pending confirms, it doesn’t appear to implement the scoring-window persistence described in #16.

Would you please reopen and review this PR again?

@LandynDev LandynDev reopened this Apr 9, 2026
Copy link
Copy Markdown
Collaborator

@LandynDev LandynDev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Can we add tests? Can you also post results of unit tests passing after?

For example:

  • round-trip save/load preserves all swap fields
  • corrupted cache file → load returns empty, no crash
  • compaction re-save when stale entries dropped (regression for the af191a0 bug)
    Plus a few more covering window-age pruning and the initialize voted_ids intersection.
  1. Can we move resolved_block back?

  2. Can you update stale class docstring for upkeeping?

swap_tracker.py:14–23 still says:

"The scoring window populates naturally as swaps complete."
Now outdated, we can update to mention store-backed restoration


Other:

  • Cache path uses config.neuron.full_path vs PendingConfirmQueue's ~/.allways/validator/.

Good to go after

@bitloi bitloi requested a review from LandynDev April 10, 2026 01:07
@bitloi
Copy link
Copy Markdown
Author

bitloi commented Apr 10, 2026

Implemented ✅

  • Added tests for:

    • round-trip save/load preserves all swap fields
    • corrupted cache file returns empty state without crashing
    • compaction re-save when stale entries are dropped (regression for af191a0)
    • window-age pruning behavior
    • initialize-time voted_ids intersection with active swaps
  • Moved the resolved-block helper back into the tracker module (_resolved_block).

  • Updated the stale SwapTracker class docstring to mention store-backed restoration on cold start.

  • Kept cache path as <neuron.full_path>/scoring_window.json (neuron-local runtime state).

Validation results:

  • python3 -m pytest tests/test_scoring_store.py -q --tb=short
    • 17 passed
  • python3 -m pytest tests/test_rate.py tests/test_chains.py -q --tb=short
    • 37 passed
  • ruff check allways/validator/scoring_store.py allways/validator/swap_tracker.py tests/test_scoring_store.py
    • All checks passed

Copy link
Copy Markdown

@PunchTheDev PunchTheDev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey, I think there's still a gap in the persistence wiring that defeats the stated invariant:

  • resolve() doesn't persistSwapTracker.resolve() at swap_tracker.py:70 is the primary "move a swap into the scoring window" path, called from forward.py:250 and :322 on every confirm and timeout vote. It mutates window and voted_ids but never calls _persist(). A crash between resolve() and the next prune_window/_poll_inner that happens to persist for another reason will lose the resolved swap — which is the exact scenario this PR exists to prevent. The docstring currently says "persistence hooks at every state mutation point," so this should be one, right?
  • resolved_block is still duplicated — my last review asked to consolidate this. _resolved_block is back in swap_tracker.py:194 (good) but resolved_block also still lives in scoring_store.py:148 doing the same thing. Please pick one home and have the other import it.
  • missing the load-bearing regression test — the store tests are solid, but none of them cover resolve() → new tracker → window restored. That's the single most valuable test for this PR's invariant and would have caught the bullet above.

Copy link
Copy Markdown
Collaborator

@LandynDev LandynDev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you address requested changes left by @PunchTheDev ?
and let me know thoughts on point 1

@bitloi
Copy link
Copy Markdown
Author

bitloi commented Apr 13, 2026

Can you address requested changes left by @PunchTheDev ? and let me know thoughts on point 1

Valid catch. There was a real crash window between voting and the next incidental persist. I fixed it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Validator restart loses scoring window and can wipe miner weights

3 participants