Docker Deployment

PyHellen provides Docker images for both CPU and GPU environments.

Quick Start

Pull from Registry

# CPU version
docker pull ghcr.io/grand-siecle/pyhellen:latest

# Run
docker run -p 8000:8000 ghcr.io/grand-siecle/pyhellen:latest

Using Docker Compose

# CPU version
docker-compose -f docker/docker-compose.yml up -d pyhellen

# GPU version (requires NVIDIA Container Toolkit)
docker-compose -f docker/docker-compose.yml up -d pyhellen-gpu

Auto-detect Script

chmod +x docker/scripts/run.sh
./docker/scripts/run.sh

Automatically detects GPU availability and runs the appropriate container.

Building Locally

CPU Version

docker build -f docker/Dockerfile -t pyhellen:latest .

GPU Version

docker build -f docker/Dockerfile.gpu -t pyhellen:gpu .

Configuration

Environment Variables

Pass environment variables via -e or docker-compose:

docker run -p 8000:8000 \
  -e AUTH_ENABLED=true \
  -e SECRET_KEY="your-secret-key" \
  -e PRELOAD_MODELS="lasla,grc" \
  ghcr.io/grand-siecle/pyhellen:latest

Persistent Model Storage

Models are stored in /data/models inside the container. Mount a volume to persist:

docker run -p 8000:8000 \
  -v pyhellen_models:/data/models \
  ghcr.io/grand-siecle/pyhellen:latest

Or with a host directory:

docker run -p 8000:8000 \
  -v /path/on/host:/data/models \
  ghcr.io/grand-siecle/pyhellen:latest

Token Database

For authentication, persist the token database:

docker run -p 8000:8000 \
  -e AUTH_ENABLED=true \
  -e SECRET_KEY="your-secret-key" \
  -e TOKEN_DB_PATH=/data/tokens.db \
  -v pyhellen_data:/data \
  ghcr.io/grand-siecle/pyhellen:latest

Docker Compose Examples

Basic CPU

version: '3.8'

services:
  pyhellen:
    image: ghcr.io/grand-siecle/pyhellen:latest
    ports:
      - "8000:8000"
    volumes:
      - model_data:/data/models
    restart: unless-stopped

volumes:
  model_data:

Production with Auth

version: '3.8'

services:
  pyhellen:
    image: ghcr.io/grand-siecle/pyhellen:latest
    ports:
      - "8000:8000"
    volumes:
      - model_data:/data/models
      - token_data:/data/db
    environment:
      - AUTH_ENABLED=true
      - SECRET_KEY=${SECRET_KEY}
      - TOKEN_DB_PATH=/data/db/tokens.db
      - PRELOAD_MODELS=lasla,grc
      - LOG_LEVEL=INFO
      - LOG_FORMAT=json
    deploy:
      resources:
        limits:
          cpus: '2.0'
          memory: 4G
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/service/health"]
      interval: 30s
      timeout: 10s
      retries: 3

volumes:
  model_data:
  token_data:

GPU Version

version: '3.8'

services:
  pyhellen-gpu:
    image: ghcr.io/grand-siecle/pyhellen:gpu
    ports:
      - "8000:8000"
    volumes:
      - model_data:/data/models
    environment:
      - USE_CUDA=1
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    restart: unless-stopped

volumes:
  model_data:

Monitoring Stack (Prometheus + Grafana)

The docker-compose includes optional monitoring services for production observability.

Available Services

Service	Description	Port	Usage
`pyhellen`	API (CPU)	8000	Choose ONE
`pyhellen-gpu`	API (GPU)	8000	Choose ONE
`prometheus`	Metrics collection	9090	Optional
`grafana`	Visualization	3000	Optional

Note: pyhellen and pyhellen-gpu are alternatives. Use one OR the other, not both.

Quick Start with Monitoring

cd docker

# Copy and configure environment
cp .env.example .env
nano .env  # Edit credentials

# CPU version with monitoring
sudo docker-compose -p pyhellen -f docker-compose.yml up -d pyhellen prometheus grafana

# GPU version with monitoring
sudo docker-compose -p pyhellen -f docker-compose.yml up -d pyhellen-gpu prometheus grafana

Configuration (.env)

Create a .env file in the docker/ directory:

# Grafana credentials
GRAFANA_ADMIN_USER=admin
GRAFANA_ADMIN_PASSWORD=your_secure_password
GRAFANA_PORT=3000
GRAFANA_ROOT_URL=http://localhost:3000

# Prometheus
PROMETHEUS_PORT=9090

Access

PyHellen API: http://localhost:8000
Prometheus: http://localhost:9090
Grafana: http://localhost:3000

Grafana Dashboard

A pre-configured "PyHellen API" dashboard is automatically loaded with:

Total requests / errors
Models loaded count
Cache hit rate
Request rate over time
Requests per model
Average processing time per model

Metrics Endpoint

PyHellen exposes Prometheus metrics at /service/metrics:

curl http://localhost:8000/service/metrics

Example output:

# HELP pyhellen_requests_total Total number of requests
# TYPE pyhellen_requests_total counter
pyhellen_requests_total 42

# HELP pyhellen_models_loaded Number of models currently loaded
# TYPE pyhellen_models_loaded gauge
pyhellen_models_loaded 2

Without Monitoring

If you don't need monitoring, simply omit the services:

# API only (CPU)
sudo docker-compose -p pyhellen -f docker-compose.yml up -d pyhellen

# API only (GPU)
sudo docker-compose -p pyhellen -f docker-compose.yml up -d pyhellen-gpu

GPU Requirements

To use GPU acceleration:

NVIDIA GPU with CUDA support
NVIDIA Driver installed on host
NVIDIA Container Toolkit (nvidia-docker)

Install NVIDIA Container Toolkit

# Ubuntu/Debian
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
  sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker

Verify GPU Access

docker run --rm --gpus all nvidia/cuda:11.8-base-ubuntu22.04 nvidia-smi

Health Checks

The container includes a health check:

curl http://localhost:8000/service/health

Docker will mark the container as unhealthy if this fails.

Logs

View Logs

docker logs pyhellen-api
docker logs -f pyhellen-api  # Follow

Log Format

By default, Docker containers use JSON logging:

{"timestamp": "2024-01-15T10:00:00", "level": "INFO", "message": "..."}

Resource Limits

Recommended limits for production:

Resource	CPU	GPU
CPU	2 cores	4 cores
Memory	4 GB	8 GB
Disk (models)	10 GB	10 GB

Networking

Behind Reverse Proxy (nginx)

upstream pyhellen {
    server localhost:8000;
}

server {
    listen 443 ssl;
    server_name api.example.com;

    location / {
        proxy_pass http://pyhellen;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

Docker Network

docker network create pyhellen-network
docker run --network pyhellen-network ...

Troubleshooting

Container Won't Start

Check logs:

docker logs pyhellen-api

Out of Memory

Increase memory limit or reduce batch size:

-e BATCH_SIZE=128

Model Download Fails

Check network and increase timeout:

-e DOWNLOAD_TIMEOUT_SECONDS=600
-e DOWNLOAD_MAX_RETRIES=5

GPU Not Detected

Verify NVIDIA Container Toolkit:

docker run --rm --gpus all nvidia/cuda:11.8-base-ubuntu22.04 nvidia-smi

FilesExpand file tree

docker.md

Latest commit

History

docker.md

File metadata and controls

Docker Deployment

Quick Start

Pull from Registry

Using Docker Compose

Auto-detect Script

Building Locally

CPU Version

GPU Version

Configuration

Environment Variables

Persistent Model Storage

Token Database

Docker Compose Examples

Basic CPU

Production with Auth

GPU Version

Monitoring Stack (Prometheus + Grafana)

Available Services

Quick Start with Monitoring

Configuration (.env)

Access

Grafana Dashboard

Metrics Endpoint

Without Monitoring

GPU Requirements

Install NVIDIA Container Toolkit

Verify GPU Access

Health Checks

Logs

View Logs

Log Format

Resource Limits

Networking

Behind Reverse Proxy (nginx)

Docker Network

Troubleshooting

Container Won't Start

Out of Memory

Model Download Fails

GPU Not Detected