A distributed system implementation using Gossip protocol for membership management, service discovery, and failure detection in containerized environments.
This project implements a fully decentralized gossip-based system where nodes communicate via UDP to maintain a consistent view of the cluster membership. Key features include:
- Push-Pull Gossip Protocol: Bi-directional membership synchronization with configurable fanout
- Failure Detection: Three-state model (ALIVE → SUSPECT → DEAD) with anti-flapping mechanisms
- Rumor Engine: Fast subset-based propagation for critical events (failures, suspicions)
- Multi-Phase Bootstrap: Automatic node discovery with configured seeds and broadcast fallback
- Graceful Leave: Clean shutdown with LEAVE message propagation
- Service Discovery: Decentralized service discovery and registration
- Docker 20.10+
- Docker Compose 1.29+
# Build images
docker-compose build
# Start all nodes
docker-compose up -d
# View logs
docker-compose logs -f
# Stop cluster
docker-compose down# Add a new node
docker-compose up -d node8
# Simulate graceful LEAVE
docker-compose stop node3
# Simulate failure
docker-compose kill node3
# Restart a node
docker-compose start node3
# Monitor specific node
docker-compose logs -f node1 | grep STATSMake sure that the image has been built before running the test
# if into /gossip folder
bash test/test_failure.sh
# if into /test folder
bash test_failure.shConfigure each node in docker-compose.yml:
environment:
- NODE_ID=node1 # Unique node identifier
- NODE_IP=node1 # Node hostname/IP
- NODE_PORT=8001 # UDP port for gossip
- SEED_NODES=node2:8002,node3:8003 # Bootstrap seed nodes (comma-separated)
- SERVICES=web,api,cache # Services offered by node (optional)/Gossip
│
├── cmd/
│ └── node/
│ └── main.go # Application entry point
│
├── config/
│ ├── config.yml #YAML configuration
│ └── config.go #Configuration loader
│
├── internal/
│ │
│ ├── membership/
│ │ └── membership.go # Membership List management
│ │
│ ├── gossip/
│ │ ├── gossip.go # Push-Pull Gossip protocol
│ │ └── roumor.go # Fast rumor propagation engine
│ │
│ ├── failure/
│ │ └── failure.go # Failure detection subsystem
│ │
│ ├── leave/
│ │ └── leave.go # Graceful shutdown protocol
│ │
│ ├── service/
│ │ └── service.go # Distributed service registry
│ │
│ └── util/
│ ├── types.go # Shared data structures
│ └── logger.go # Logging utilities
│
├── test/
│ ├── test_join.sh # Test node joining the system
│ ├── test_leave.sh # Test node leaving the system
│ └── test_failure.sh # Test failure detection
│
├── demo.sh # Live demo of the entire system
├── docker-compose.yml # Multi-node cluster configuration
├── Dockerfile # Container image definition
├── go.mod # Go module dependencies
├── go.sum # Dependency checksums
└── README.md # Project documentation