A carrier-grade edge-core-cloud distributed telecom system implementing advanced fault tolerance, load balancing, transaction management, and performance optimization across heterogeneous nodes.
This project implements a comprehensive distributed telecom system that interconnects five heterogeneous nodes across three architectural layers (Edge, Core, Cloud). The system is designed to handle diverse failure modes (crash, omission, Byzantine), optimize performance across multiple dimensions, and provide strong consistency guarantees under concurrent operations.
- Multi-Layer Architecture: Edge-Core-Cloud topology with optimized service placement
- Advanced Fault Tolerance: Handles crash, omission, and Byzantine failures with automated recovery
- Dynamic Load Balancing: Resource-aware allocation with adaptive migration
- Distributed Transactions: 2PC/3PC protocols with deadlock detection and resolution
- Performance Optimization: Real-time bottleneck analysis and throughput maximization
- Redundancy & Failover: Risk-based replication strategies with automated failover
- Property-Based Testing: Comprehensive validation using QuickCheck for Java
- Real-Time Dashboard: React-based monitoring UI with live metrics visualization
graph TB
subgraph Cloud["Cloud Layer"]
Cloud1["Cloud1<br/>Analytics, DSM<br/>22ms, 1250Mbps, 16GB<br/>Omission Failures"]
end
subgraph Core["Core Layer"]
Core1["Core1<br/>Transaction Commit<br/>8ms, 1000Mbps, 12GB<br/>Byzantine Failures"]
Core2["Core2<br/>Load Balancing<br/>10ms, 950Mbps, 10GB<br/>Crash Failures"]
end
subgraph Edge["Edge Layer"]
Edge1["Edge1<br/>RPC, Replication<br/>12ms, 500Mbps, 8GB<br/>Crash Failures"]
Edge2["Edge2<br/>Migration, Recovery<br/>15ms, 470Mbps, 4.5GB<br/>Omission Failures"]
end
Edge1 <--> Core1
Edge1 <--> Core2
Edge2 <--> Core1
Edge2 <--> Core2
Core1 <--> Cloud1
Core2 <--> Cloud1
Edge1 -.-> Edge2
Core1 -.-> Core2
| Node | Layer | Latency | Throughput | CPU | Memory | Tx/sec | Failure Type |
|---|---|---|---|---|---|---|---|
| Edge1 | Edge | 12ms | 500 Mbps | 45% | 8.0GB | 150 | Crash |
| Edge2 | Edge | 15ms | 470 Mbps | 50% | 4.5GB | 100 | Omission |
| Core1 | Core | 8ms | 1000 Mbps | 60% | 12.0GB | 250 | Byzantine |
| Core2 | Core | 10ms | 950 Mbps | 55% | 10.0GB | 200 | Crash |
| Cloud1 | Cloud | 22ms | 1250 Mbps | 72% | 16.0GB | 300 | Omission |
The easiest way to run the entire system is using Docker:
Prerequisites:
- Docker Engine 20.10+ or Docker Desktop
- Docker Compose 2.0+
Quick Start:
# Linux/macOS
./docker-start.sh
# Windows
docker-start.bat
# Or manually
docker-compose up --buildAccess the dashboard at http://localhost:5173
See DOCKER.md for detailed documentation.
Prerequisites:
- Java 11 or higher
- Maven 3.6 or higher
- Python 3.8 or higher
- Node.js 18+ (for dashboard)
- Git
- Update and Prepare System
sudo apt update && sudo apt upgrade -y sudo apt install -y curl wget git build-essential software-properties-common apt-transport-https ca-certificates - Core Languages & Build Tool
Maven: Used for managing your Java multi-module project.
sudo apt install -y openjdk-21-jdk # Verify installation java -versionsudo apt install -y maven # Verify installation mvn -version - Python
sudo apt install -y python3 python3-pip python3-venv
- Networking & Communication (Protobuf)
sudo apt install -y protobuf-compiler # Verify installation protoc --version - Containerization (Docker & Docker Compose)
# Add Docker's official GPG key: sudo install -m 0755 -d /etc/apt/keyrings sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc sudo chmod a+r /etc/apt/keyrings/docker.asc # Add the repository to Apt sources: echo \ "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \ $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \ sudo tee /etc/apt/sources.list.d/docker.list > /dev/null sudo apt update sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin # Post-install: Run docker without sudo sudo usermod -aG docker $USER newgrp docker
- Python Simulation Libraries
# Create and activate a virtual environment python3 -m venv telecom_env source telecom_env/bin/activate # Install simulation and optimization libraries pip install simpy numpy pandas scipy scikit-learn
-
Clone the repository
-
Build the Java components
mvn clean install
-
Install Python dependencies
cd python_simulation pip install -r requirements.txt -
Install Dashboard dependencies (optional)
cd dashboard npm install
The easiest way to build and run everything is using the provided scripts:
start.batDouble-click the file or run it from Command Prompt. The script will:
- Clean previous builds
- Build the Java project
- Run Java tests
- Run the Python simulation demo
- Build the Dashboard (if Node.js is available)
- Create a Python virtual environment and start the dashboard (backend + frontend)
chmod +x build.sh
./build.shThe script performs the same steps as Windows and will open the dashboard in your browser automatically.
If you prefer to run components individually:
Java System
# Build the project
mvn clean install
# Run tests
mvn test
# Run property-based tests
mvn test -Dtest="*PropertyTest"Python Simulation
cd python_simulation
# Run load balancing simulation
python3 demo.py
# Run redundancy and failover demo
python3 redundancy_demo.py
# Run all tests
python3 run_tests.pyDashboard (React UI)
The dashboard requires a Python virtual environment for the backend:
# Set up the backend (first time only)
cd dashboard/backend
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
# Start the backend
python main.pyIn a separate terminal:
# Start the frontend
cd dashboard
npm install # First time only
npm run devThe dashboard will be available at:
- Frontend: http://localhost:5173
- Backend API: http://localhost:8000
The real-time monitoring dashboard provides visualization of the distributed telecom system.
- Node Status: Live status cards for all 5 nodes (Edge1, Edge2, Core1, Core2, Cloud1)
- System Topology: Interactive visualization of the 3-tier architecture
- Metrics Charts: CPU, Memory, and Latency bar charts
- Load Balance Gauge: Real-time load distribution index
- Transaction Monitor: Recent transactions with status tracking
- Failover Events: Fault tolerance and recovery event log
- Dark/Light Mode: Theme toggle support
The dashboard consists of a FastAPI backend (Python) and a React frontend.
Quick Start (using build scripts):
# Linux/macOS
./build.sh
# Windows
start.batManual Setup:
-
Set up the Python backend (first time):
cd dashboard/backend python3 -m venv venv source venv/bin/activate # Windows: venv\Scripts\activate pip install -r requirements.txt
-
Start the backend:
cd dashboard/backend source venv/bin/activate # Windows: venv\Scripts\activate python main.py
-
Start the frontend (in a new terminal):
cd dashboard npm install # First time only npm run dev
-
Open http://localhost:5173 in your browser
Note: The dashboard will display mock data if the backend is not running, but for live simulation data, both backend and frontend must be running.
| Endpoint | Method | Description |
|---|---|---|
/api/metrics |
GET | Current system metrics for all nodes |
/api/nodes/{id} |
GET | Specific node details |
/api/transactions |
GET | Recent transaction list |
/api/failover-events |
GET | Recent failover events |
/api/simulation/start |
POST | Start the simulation |
/api/simulation/stop |
POST | Stop the simulation |
/ws/metrics |
WebSocket | Real-time metrics stream |
- DistributedTelecomSystem: Main orchestrator integrating all system components
- NodeManager: Manages individual node lifecycle and metrics
- CommunicationManager: Handles inter-node RPC and messaging
- TransactionManager: Implements 2PC/3PC distributed transaction protocols
- FaultToleranceManager: Detects and recovers from various failure types
- LoadBalancer: Dynamic resource-aware load distribution
- ReplicationManager: Data replication and migration strategies
- PerformanceAnalyzer: Real-time bottleneck identification and ranking
- SystemOptimizer: Multi-objective performance optimization
- LoadBalancerSimulation: Dynamic load balancing with failure injection
- RedundancyFailoverManager: Risk-based redundancy and automated failover
- FailureInjector: Simulates crash, omission, and Byzantine failures
- NetworkDelaySimulator: Realistic network latency and jitter simulation
- AdaptiveMigrationEngine: Intelligent service migration decisions
- Frontend: React 18 + TypeScript + Tailwind CSS v4 + shadcn/ui design system
- Backend: FastAPI with WebSocket support for real-time updates
- Charts: Recharts for data visualization
- State: Custom hooks with automatic polling and WebSocket fallback
The system employs a dual testing approach:
- Specific scenarios with known inputs/outputs
- Edge case validation
- Component integration testing
- 30 universal properties validated across all inputs
- Automated test case generation
- Regression prevention
Run all tests:
# Java tests
mvn test
# Python tests
cd python_simulation
python3 run_tests.pyThe system tracks and optimizes:
- Latency: 8-22ms range across nodes
- Throughput: 470-1250 Mbps capacity
- CPU Utilization: 45-72% operational range
- Memory Usage: 4.0-16.0 GB per node
- Transaction Rate: 100-300 tx/sec
- Lock Contention: 5-15% typical range
System configuration is managed through:
- Java:
SystemConfigurationclass with node-specific settings - Python:
SimulationConfigdataclass for simulation parameters
Example configuration:
SystemConfiguration config = new SystemConfiguration.Builder()
.withMaxConcurrentTransactions(1000)
.withTransactionTimeout(5000)
.withHealthCheckInterval(1000)
.build();- DOCUMENTATION.md: Comprehensive technical documentation with detailed architecture, algorithms, and implementation details
- Dashboard README: Dashboard setup and development guide
- Requirements: Complete requirements specification
- Design: System design and architecture
Contributions are welcome! Please follow these guidelines:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Inspired by carrier-grade telecom systems
- Built with modern distributed systems principles
- Implements formal verification and property-based testing methodologies
For questions or support, please open an issue on GitHub.
Built for distributed systems excellence