Skip to content

Latest commit

Β 

History

History
427 lines (342 loc) Β· 12 KB

File metadata and controls

427 lines (342 loc) Β· 12 KB

Distributed Task Queue - Advanced Implementation

πŸ“‹ Overview

This is a production-ready distributed task queue system built with Python, Redis, and Kubernetes. It includes enterprise-grade features for reliability, observability, and monitoring.

✨ Key Features

Core Features

  • βœ… Distributed Task Processing - Multiple workers processing tasks in parallel
  • βœ… Task Retry with Exponential Backoff - Automatic retries with configurable delays
  • βœ… Dead Letter Queue (DLQ) - Failed tasks moved to DLQ for manual handling
  • βœ… Task Priority - High and normal priority task queues
  • βœ… Prometheus Metrics - Comprehensive metrics for monitoring
  • βœ… Jaeger Distributed Tracing - End-to-end request tracing
  • βœ… Web Dashboard - Real-time monitoring interface
  • βœ… Kubernetes Ready - Production deployment manifests included
  • βœ… Enhanced Logging - Visual task tracking with emojis

πŸš€ Quick Start

Prerequisites

  • Docker & Docker Compose
  • Python 3.11+ (for local development)
  • kubectl (for Kubernetes deployment)

Start in 30 seconds

# Clone and navigate
cd distributed-task-queue-python

# Start all services
docker-compose up -d --build

# Open dashboard
open http://localhost:5001

Services Available

πŸ“š Documentation

Document Purpose
FEATURES_SUMMARY.md Complete feature overview
ADVANCED_FEATURES.md Detailed feature documentation & deployment
QUICK_START_FEATURES.md Quick start guide with examples
COMMAND_REFERENCE.md CLI commands & cheat sheet

🎯 What's Implemented

1. Task Retry with Exponential Backoff

Failed tasks automatically retry with increasing delays:

  • Initial delay: 5 seconds (configurable)
  • Backoff multiplier: 2.0x (configurable)
  • Max delay: 1 hour (configurable)
  • Max retries: 3 (configurable)
# Configure in docker-compose.yml
MAX_RETRIES=3
INITIAL_RETRY_DELAY=5
RETRY_BACKOFF_MULTIPLIER=2.0
MAX_RETRY_DELAY=3600

2. Dead Letter Queue

Tasks that fail after max retries go to DLQ for analysis and manual retry:

# View failed tasks
curl http://localhost:8000/dlq/failed-tasks

# Retry a failed task
curl -X POST http://localhost:8000/dlq/failed-tasks/{TASK_ID}/retry

# Remove from DLQ
curl -X DELETE http://localhost:8000/dlq/failed-tasks/{TASK_ID}

3. Prometheus Metrics

Comprehensive metrics tracking all operations:

  • Task submissions/completions/failures
  • Processing duration
  • Retry attempts
  • Queue sizes
  • Worker status
  • DLQ size

Access: http://localhost:8000/metrics

4. Jaeger Distributed Tracing

Full request tracing across all services:

  • Request flow visualization
  • Operation timing
  • Error tracking
  • Redis operation tracing

Access: http://localhost:16686

5. Web Dashboard

Real-time monitoring interface showing:

  • Live statistics
  • Active workers
  • Recent tasks
  • Dead Letter Queue
  • Auto-refresh every 5 seconds

Access: http://localhost:5001

6. Kubernetes Deployment

Complete K8s manifests for production:

  • Redis with persistent storage
  • Gateway with load balancer
  • Multiple workers with scaling
  • Prometheus & Jaeger for monitoring
  • Dashboard for visualization

Deploy: kubectl apply -f k8s/deployment.yaml

πŸ“Š Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   Client Applications                    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                     β”‚
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β–Ό            β–Ό             β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚Gateway β”‚  β”‚Metrics β”‚  β”‚Dashboard β”‚
    β”‚:8000   β”‚  β”‚:8000   β”‚  β”‚:5001     β”‚
    β””β”€β”€β”€β”€β”¬β”€β”€β”€β”˜  β””β”€β”€β”€β”¬β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
         β”‚          β”‚            β”‚
         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    β”‚
          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
          β”‚      Redis         β”‚
          β”‚ (Queues, Storage)  β”‚
          β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”¬β”€β”€β”€β”˜
               β”‚       β”‚   β”‚
         β”Œβ”€β”€β”€β”€β”€β”˜       β”‚   └──────────┐
         β”‚             β”‚              β”‚
    β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”
    β”‚ Worker1 β”‚  β”‚ Worker2 β”‚  β”‚ Worker3 β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Prometheus  β”‚      β”‚  Jaeger  β”‚
β”‚  Metrics    β”‚      β”‚ Tracing  β”‚
β”‚  :9090      β”‚      β”‚  :16686  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ”„ Task Lifecycle

1. SUBMIT      β†’ Task submitted via API
                 πŸ“€ SUBMITTED log message
                 Stored in Redis
                 Added to pending queue

2. ASSIGN      β†’ Worker picks up task
                 πŸ“‹ ASSIGNED log message
                 Status β†’ "processing"

3. PROCESS     β†’ Task is executing
                 βš™οΈ STARTED log message
                 ⏱️ Simulated processing
                 
4. COMPLETE    β†’ Task finished
                 βœ… COMPLETED log message
                 Status β†’ "completed"
                 OR
4. FAIL        β†’ Task failed
                 ❌ ERROR log message
                 Status β†’ "retrying"
                 Scheduled for retry
                 
5. RETRY       β†’ Retry after delay
                 πŸ”„ REQUEUED log message
                 Status β†’ "pending"
                 (repeats from step 2)
                 
6. DLQ         β†’ Max retries exhausted
                 πŸ’€ DLQ log message
                 Status β†’ "failed"
                 Stored in Dead Letter Queue

πŸ’» Usage Examples

Submit a Task

curl -X POST http://localhost:8000/tasks \
  -H "Content-Type: application/json" \
  -d '{
    "type": "process",
    "data": "hello-world",
    "priority": "high"
  }'

Get Task Status

curl http://localhost:8000/tasks/{TASK_ID}

Monitor System Health

curl http://localhost:8000/health

View Metrics

# Raw Prometheus format
curl http://localhost:8000/metrics

# In Prometheus UI
open http://localhost:9090
# Query: task_submissions_total

Trace Requests

open http://localhost:16686
# Select service: "gateway"
# View traces showing request flow

Run Stress Test

python tests/stress_test.py
# Submits 100 tasks and monitors completion

πŸ”§ Configuration

Environment Variables

Variable Default Purpose
MAX_RETRIES 3 Maximum retry attempts
INITIAL_RETRY_DELAY 5 Initial retry delay (seconds)
RETRY_BACKOFF_MULTIPLIER 2.0 Backoff multiplier per retry
MAX_RETRY_DELAY 3600 Maximum retry delay (seconds)
REDIS_HOST localhost Redis server hostname
REDIS_PORT 6379 Redis server port
JAEGER_HOST localhost Jaeger agent hostname
JAEGER_PORT 6831 Jaeger agent port

Customize Retry Strategy

Edit docker-compose.yml:

environment:
  - MAX_RETRIES=5
  - INITIAL_RETRY_DELAY=10
  - RETRY_BACKOFF_MULTIPLIER=1.5
  - MAX_RETRY_DELAY=7200

πŸ“ˆ Monitoring

Prometheus Metrics

  • task_submissions_total - Total tasks submitted
  • task_completions_total - Completed tasks
  • task_duration_seconds - Processing time (histogram)
  • task_retries_total - Total retries
  • tasks_in_dlq_total - Failed tasks
  • active_workers_total - Active workers
  • pending_tasks_total - Pending tasks

Jaeger Traces

  • Request flow visualization
  • Service dependencies
  • Operation timing
  • Error tracking
  • Redis operation tracing

Dashboard Metrics

  • Real-time task counts
  • Worker status
  • Queue depths
  • DLQ management

πŸš€ Deployment

Docker Compose (Development/Testing)

docker-compose up -d --build

Kubernetes (Production)

# Apply all manifests
kubectl apply -f k8s/deployment.yaml

# Scale workers
kubectl scale deployment worker -n task-queue --replicas=5

# Port forward services
kubectl port-forward -n task-queue svc/gateway 8000:5000
kubectl port-forward -n task-queue svc/dashboard 5001:5001

πŸ” Troubleshooting

Services not starting?

# Check logs
docker logs task-queue-gateway
docker logs task-queue-worker-1

# Verify Redis
redis-cli PING

Metrics not appearing?

# Verify endpoint
curl http://localhost:8000/metrics

# Check Prometheus targets
open http://localhost:9090/targets

Tasks not processing?

# Check worker logs
docker logs -f task-queue-worker-1

# Check queue status
curl http://localhost:8000/health

# Verify Redis
redis-cli LLEN tasks:pending

πŸ“ Project Structure

distributed-task-queue-python/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ gateway/
β”‚   β”‚   └── server.py              # REST API gateway
β”‚   β”œβ”€β”€ worker/
β”‚   β”‚   └── worker.py              # Task worker with retry logic
β”‚   β”œβ”€β”€ dashboard/
β”‚   β”‚   β”œβ”€β”€ app.py                 # Dashboard API
β”‚   β”‚   └── templates/index.html   # Dashboard UI
β”‚   └── shared/
β”‚       β”œβ”€β”€ metrics.py             # Prometheus metrics
β”‚       └── tracing.py             # Jaeger tracing
β”œβ”€β”€ k8s/
β”‚   └── deployment.yaml            # Kubernetes manifests
β”œβ”€β”€ tests/
β”‚   β”œβ”€β”€ stress_test.py             # Load test
β”‚   └── test_client.py             # Interactive CLI client
β”œβ”€β”€ docker-compose.yml             # Docker Compose config
β”œβ”€β”€ Dockerfile.*                   # Container definitions
β”œβ”€β”€ prometheus.yml                 # Prometheus config
β”œβ”€β”€ requirements.txt               # Python dependencies
└── *.md                           # Documentation files

πŸ“š Additional Resources

Documentation Files

  • FEATURES_SUMMARY.md - Complete feature overview
  • ADVANCED_FEATURES.md - Detailed deployment guide
  • QUICK_START_FEATURES.md - Quick examples
  • COMMAND_REFERENCE.md - CLI command cheat sheet

Key Files

  • docker-compose.yml - Full stack configuration
  • k8s/deployment.yaml - Kubernetes manifests
  • prometheus.yml - Prometheus configuration

🎯 Next Steps

  1. Explore the Dashboard

  2. Check the Metrics

  3. Trace a Request

  4. Run Stress Test

    • python tests/stress_test.py
    • Watch workers process 100 tasks
  5. Deploy to Kubernetes

    • Build images for your cluster
    • kubectl apply -f k8s/deployment.yaml
    • Scale workers as needed

🀝 Contributing

Feel free to extend this system with:

  • Custom task types
  • Additional metrics
  • Custom dashboard widgets
  • Integration with other systems

πŸ“ License

This project is provided as-is for educational and production use.


Last Updated: December 24, 2025

Status: βœ… All Features Implemented & Tested

For detailed information, see the documentation files listed above.