Fuzz Network Test Framework by samliok · Pull Request #326 · ava-labs/Simplex

samliok · 2026-01-14T13:12:03Z

No description provided.

testutil/random_network/network.go

yacovm · 2026-02-02T14:53:47Z

.gitignore

+
+# tmp folder
+
+/tmp


What's the use of adding /tmp?

the logs are written to /tmp, and i would like to save the logs for debugging but not commit them to github

yacovm · 2026-02-02T15:57:39Z

testutil/random_network/config.go

+		TimeUpdateFrequency:   100 * time.Millisecond,
+		TimeUpdateAmount:      500 * time.Millisecond,
+		CrashInterval:         800 * time.Millisecond,
+		LogDirectory:          "tmp",


I think we need to create a temporary folder and if the test succeeds, delete the logs.

This way you don't need to delete the directory at the beginning of the test

added deleting the tmp directory if the test fails, but I still think we should check if the directory exists at the beginning of the test. This way if the test fails, and we want to re-run we don't append to an already existing log file

testutil/random_network/node.go

testutil/random_network/block.go

testutil/random_network/config.go

testutil/random_network/block_test.go

testutil/random_network/logging.go

yacovm · 2026-02-02T18:09:56Z

testutil/random_network/network.go

+}
+
+func (n *Network) StartInstances() {
+	panic("Call Run() Instead")


This method isn't used. Why is it needed?

because BasicNetwork is embedded in Network i don't want the caller accidentally calling StartInstances thinking its a valid method

testutil/random_network/network.go

testutil/random_network/mempool.go

yacovm · 2026-02-02T19:30:58Z

testutil/random_network/mempool.go

+}
+
+func (m *Mempool) isTxInChain(txID txID, parentDigest simplex.Digest) bool {
+	block, exists := m.verifiedButNotAcceptedTXs[parentDigest]


I'm not sure this works for a case where we have a leader failover that creates a sibling block.

Suppose block a is notarized and then a block b is built on top of it and broadcast, and verified by a node but then the majority notarize the empty block.

In the next round, block b' is proposed on top of block a.
We need a way to rollback the verified but not accepted map for the parent a because both blocks b and b' are built on top of it.

One way of doing it, is move this book-keeping to the blocks themselves.
Specifically, have the instances of the blocks maintain pointers to blocks they are built upon, and then just have the transaction list inside a block a map, where when we encode the bytes of the block, we do it across some order that is set at the time we build the block.

We need a way to rollback the verified but not accepted map for the parent a because both blocks b and b' are built on top of it.

why do we need this? we aren't checking if the block was verified before accepting it anyways. We can do so if we want.

secondly, this should still succeed verification in your example. If we want to make the mempool more rigorous we could check if any two verified blocks have the same parent and delete the earlier block before verified the later. In your example when block b initially gets verified we do nothing. Then if block b' gets verified, we delete block b from the verification map before marking b' as verified. Then we can adjust AcceptBlock to ensure the block has been verified before hand.

hmm, actually I think you are right. been trying to solve a bug relating to this. I dont think we can have the blocks point to other blocks however since we need to be able to serialize/deserialize blocks and this would cause some recursive headaches. Maybe its best to have the mempool keep a lookup table of built blocks and just fetch from this table using the parent digest when needed

why do we need this? we aren't checking if the block was verified before accepting it anyways. We can do so if we want.

If you have verified a block and you haven't accepted it, but accepted a sibling, you need to put back all transactions you have removed from the mempool, back into the mempool, of the block that you ended up accepting its sibling (unless that transaction ended up in the sibling block)

testutil/random_network/node.go

yacovm · 2026-02-02T19:44:53Z

Can we rebase on top of main?

yacovm · 2026-02-25T19:31:48Z

This is still somewhat non-deterministic to some degree:

When I create a digest of the crashed and recovered node IDs in the schedule I get two possible outcomes: 9b9336fc010a3ccf10d41bb4344e36f2b2ec5c1c2f3940c76a4994005c1560c8 and 753a40743c43bf352998aad8d5a12df4df01fcd001d0baff1d513f3e85741a20

Patch is below:

diff --git a/testutil/random_network/network.go b/testutil/random_network/network.go
index e381fc0..188c7e9 100644
--- a/testutil/random_network/network.go
+++ b/testutil/random_network/network.go
@@ -4,6 +4,8 @@
 package random_network
 
 import (
+       "crypto/sha256"
+       "encoding/hex"
        "math/rand"
        "sync"
        "testing"
@@ -27,6 +29,9 @@ type Network struct {
 
        // tx stats
        allIssuedTxs int
+
+       recoveredNodes []simplex.NodeID
+       crashedNodes   []simplex.NodeID
 }
 
 func NewNetwork(config *FuzzConfig, t *testing.T) *Network {
@@ -224,6 +229,7 @@ func (n *Network) crashAndRecoverNodes() {
                        // randomly decide to recover based on NodeRecoverPercentage
                        if n.randomness.Float64() < n.config.NodeRecoverPercentage {
                                n.startNode(i)
+                               n.recoveredNodes = append(n.recoveredNodes, node.E.ID)
                                recoveredNodes = append(recoveredNodes, node.E.ID)
                                maxLeftToCrash++
                        }
@@ -294,6 +300,18 @@ func (n *Network) PrintStatus() {
                node.PrintMessageTypesSent()
        }
 
+       digest := sha256.New()
+
+       for _, node := range n.crashedNodes {
+               digest.Write([]byte(node.String()))
+       }
+
+       for _, node := range n.recoveredNodes {
+               digest.Write([]byte(node.String()))
+       }
+
+       n.logger.Info("Crashed and Recovered Nodes Digest", zap.String("digest", hex.EncodeToString(digest.Sum(nil))))
+
        // prints total issued txs and failed txs
        n.logger.Info("Transaction Stats", zap.Int("total issued txs", n.allIssuedTxs))
 }
@@ -302,6 +320,7 @@ func (n *Network) crashNode(idx int) {
        n.logger.Debug("Crashing node", zap.Stringer("nodeID", n.nodes[idx].E.ID))
        n.nodes[idx].isCrashed.Store(true)
        n.nodes[idx].Stop()
+       n.crashedNodes = append(n.crashedNodes, n.nodes[idx].E.ID)
 }
 
 func (n *Network) startNode(idx int) {

When we have 9b9336fc010a3ccf10d41bb4344e36f2b2ec5c1c2f3940c76a4994005c1560c8 we have 50 iterations of the loop or 51.

When we have 753a40743c43bf352998aad8d5a12df4df01fcd001d0baff1d513f3e85741a20 we have 49 iterations of the loop.

Can we make this fully deterministic? 🙏

It looks like we're very close to have it deterministic.

yacovm · 2026-02-24T16:19:28Z

testutil/random_network/mempool.go

+
+}
+
+func (m *Mempool) moveTxsToUnaccepted(block *Block) {


Why do we need to do this? we only delete from unacceptedTxs in AcceptBlock so if a transaction is in unacceptedTxs in a child block or in a grandchild block, it won't be removed from it anyway.

yacovm · 2026-02-24T16:46:50Z

fuzz_network_test.go

+
+func TestNetworkSimpleFuzz(t *testing.T) {
+	config := random_network.DefaultFuzzConfig()
+	config.RandomSeed = 1770220909588


can we initialize a random seed here?

yacovm · 2026-02-24T18:03:46Z

testutil/random_network/network.go

+				node.mempool.lock.Lock()
+				n.logger.Debug("Not all txs accepted yet by node", zap.Stringer("nodeID", node.E.ID), zap.Int("unaccepted txs in mempool", len(node.mempool.unacceptedTxs)))
+				node.mempool.lock.Unlock()
+				allAccepted = false


we can just break here

yacovm · 2026-02-25T21:39:55Z

testutil/random_network/network.go

+	return minHeightNodeID
+}
+
+func (n *Network) recoverToHeight(height uint64) {


The below way we are more deterministic (written by me, not thanks to CC):

func (n *Network) recoverToHeight(height uint64) { // Decide which crashed nodes to recover and in what order. // This keeps all randomness calls deterministic (fixed count per invocation), // independent of the non-deterministic AdvanceTime loop below. var nodesToRecover []int for i, node := range n.nodes { if node.isCrashed.Load() { nodesToRecover = append(nodesToRecover, i) } } n.randomness.Shuffle(len(nodesToRecover), func(i, j int) { nodesToRecover[i], nodesToRecover[j] = nodesToRecover[j], nodesToRecover[i] }) // Recover nodes one at a time in the chosen order, advancing time until // all nodes have caught up before recovering the next one. for _, idx := range nodesToRecover { n.logger.Debug("Recovering node", zap.Stringer("nodeID", n.nodes[idx].E.ID)) n.startNode(idx) for n.nodes[idx].storage.NumBlocks() < height { n.logger.Debug("Advancing network time", zap.Uint64("num crashed nodes", n.numCrashedNodes()), zap.Uint64("min height", n.getMinHeight()), zap.Uint64("max height", n.getMaxHeight()), zap.Stringer("Smallest node ID", n.getMinHeightNodeID()), zap.Uint64("target height", height), ) for _, node := range n.nodes { n.lock.Lock() node.AdvanceTime(n.config.AdvanceTimeTickAmount) n.lock.Unlock() } } } // Advance time until all remaining nodes reach the target height. for n.getMinHeight() < height { n.logger.Debug("Advancing network time", zap.Uint64("num crashed nodes", n.numCrashedNodes()), zap.Uint64("min height", n.getMinHeight()), zap.Uint64("max height", n.getMaxHeight()), zap.Stringer("Smallest node ID", n.getMinHeightNodeID()), zap.Uint64("target height", height), ) n.lock.Lock() n.BasicInMemoryNetwork.AdvanceTime(n.config.AdvanceTimeTickAmount) n.lock.Unlock() } }

yacovm · 2026-02-25T21:40:18Z

will continue the review tomorrow

samliok changed the base branch from main to refactor-testutil January 20, 2026 22:21

samliok marked this pull request as ready for review January 21, 2026 20:05

samliok linked an issue Jan 22, 2026 that may be closed by this pull request

Create a testing framework that generates random test cases #320

Open

yacovm mentioned this pull request Feb 2, 2026

Refactor Test Utils #329

Merged