PyHellen provides Docker images for both CPU and GPU environments.
# CPU version
docker pull ghcr.io/grand-siecle/pyhellen:latest
# Run
docker run -p 8000:8000 ghcr.io/grand-siecle/pyhellen:latest# CPU version
docker-compose -f docker/docker-compose.yml up -d pyhellen
# GPU version (requires NVIDIA Container Toolkit)
docker-compose -f docker/docker-compose.yml up -d pyhellen-gpuchmod +x docker/scripts/run.sh
./docker/scripts/run.shAutomatically detects GPU availability and runs the appropriate container.
docker build -f docker/Dockerfile -t pyhellen:latest .docker build -f docker/Dockerfile.gpu -t pyhellen:gpu .Pass environment variables via -e or docker-compose:
docker run -p 8000:8000 \
-e AUTH_ENABLED=true \
-e SECRET_KEY="your-secret-key" \
-e PRELOAD_MODELS="lasla,grc" \
ghcr.io/grand-siecle/pyhellen:latestModels are stored in /data/models inside the container. Mount a volume to persist:
docker run -p 8000:8000 \
-v pyhellen_models:/data/models \
ghcr.io/grand-siecle/pyhellen:latestOr with a host directory:
docker run -p 8000:8000 \
-v /path/on/host:/data/models \
ghcr.io/grand-siecle/pyhellen:latestFor authentication, persist the token database:
docker run -p 8000:8000 \
-e AUTH_ENABLED=true \
-e SECRET_KEY="your-secret-key" \
-e TOKEN_DB_PATH=/data/tokens.db \
-v pyhellen_data:/data \
ghcr.io/grand-siecle/pyhellen:latestversion: '3.8'
services:
pyhellen:
image: ghcr.io/grand-siecle/pyhellen:latest
ports:
- "8000:8000"
volumes:
- model_data:/data/models
restart: unless-stopped
volumes:
model_data:version: '3.8'
services:
pyhellen:
image: ghcr.io/grand-siecle/pyhellen:latest
ports:
- "8000:8000"
volumes:
- model_data:/data/models
- token_data:/data/db
environment:
- AUTH_ENABLED=true
- SECRET_KEY=${SECRET_KEY}
- TOKEN_DB_PATH=/data/db/tokens.db
- PRELOAD_MODELS=lasla,grc
- LOG_LEVEL=INFO
- LOG_FORMAT=json
deploy:
resources:
limits:
cpus: '2.0'
memory: 4G
restart: unless-stopped
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/service/health"]
interval: 30s
timeout: 10s
retries: 3
volumes:
model_data:
token_data:version: '3.8'
services:
pyhellen-gpu:
image: ghcr.io/grand-siecle/pyhellen:gpu
ports:
- "8000:8000"
volumes:
- model_data:/data/models
environment:
- USE_CUDA=1
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
restart: unless-stopped
volumes:
model_data:The docker-compose includes optional monitoring services for production observability.
| Service | Description | Port | Usage |
|---|---|---|---|
pyhellen |
API (CPU) | 8000 | Choose ONE |
pyhellen-gpu |
API (GPU) | 8000 | Choose ONE |
prometheus |
Metrics collection | 9090 | Optional |
grafana |
Visualization | 3000 | Optional |
Note:
pyhellenandpyhellen-gpuare alternatives. Use one OR the other, not both.
cd docker
# Copy and configure environment
cp .env.example .env
nano .env # Edit credentials
# CPU version with monitoring
sudo docker-compose -p pyhellen -f docker-compose.yml up -d pyhellen prometheus grafana
# GPU version with monitoring
sudo docker-compose -p pyhellen -f docker-compose.yml up -d pyhellen-gpu prometheus grafanaCreate a .env file in the docker/ directory:
# Grafana credentials
GRAFANA_ADMIN_USER=admin
GRAFANA_ADMIN_PASSWORD=your_secure_password
GRAFANA_PORT=3000
GRAFANA_ROOT_URL=http://localhost:3000
# Prometheus
PROMETHEUS_PORT=9090- PyHellen API: http://localhost:8000
- Prometheus: http://localhost:9090
- Grafana: http://localhost:3000
A pre-configured "PyHellen API" dashboard is automatically loaded with:
- Total requests / errors
- Models loaded count
- Cache hit rate
- Request rate over time
- Requests per model
- Average processing time per model
PyHellen exposes Prometheus metrics at /service/metrics:
curl http://localhost:8000/service/metricsExample output:
# HELP pyhellen_requests_total Total number of requests
# TYPE pyhellen_requests_total counter
pyhellen_requests_total 42
# HELP pyhellen_models_loaded Number of models currently loaded
# TYPE pyhellen_models_loaded gauge
pyhellen_models_loaded 2
If you don't need monitoring, simply omit the services:
# API only (CPU)
sudo docker-compose -p pyhellen -f docker-compose.yml up -d pyhellen
# API only (GPU)
sudo docker-compose -p pyhellen -f docker-compose.yml up -d pyhellen-gpuTo use GPU acceleration:
- NVIDIA GPU with CUDA support
- NVIDIA Driver installed on host
- NVIDIA Container Toolkit (nvidia-docker)
# Ubuntu/Debian
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart dockerdocker run --rm --gpus all nvidia/cuda:11.8-base-ubuntu22.04 nvidia-smiThe container includes a health check:
curl http://localhost:8000/service/healthDocker will mark the container as unhealthy if this fails.
docker logs pyhellen-api
docker logs -f pyhellen-api # FollowBy default, Docker containers use JSON logging:
{"timestamp": "2024-01-15T10:00:00", "level": "INFO", "message": "..."}Recommended limits for production:
| Resource | CPU | GPU |
|---|---|---|
| CPU | 2 cores | 4 cores |
| Memory | 4 GB | 8 GB |
| Disk (models) | 10 GB | 10 GB |
upstream pyhellen {
server localhost:8000;
}
server {
listen 443 ssl;
server_name api.example.com;
location / {
proxy_pass http://pyhellen;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
}docker network create pyhellen-network
docker run --network pyhellen-network ...Check logs:
docker logs pyhellen-apiIncrease memory limit or reduce batch size:
-e BATCH_SIZE=128Check network and increase timeout:
-e DOWNLOAD_TIMEOUT_SECONDS=600
-e DOWNLOAD_MAX_RETRIES=5Verify NVIDIA Container Toolkit:
docker run --rm --gpus all nvidia/cuda:11.8-base-ubuntu22.04 nvidia-smi