This guide provides instructions for running the MAX Agentic Cookbook in Docker with integrated MAX model serving.
The Docker container bundles two services orchestrated by PM2:
- MAX model serving - Serves models via OpenAI-compatible API on port 8000
- Web app - FastAPI backend + React frontend on port 8010
- GPU support - Works with both NVIDIA and AMD GPUs
See Dockerfile and ecosystem.config.js for implementation details.
Build the container image:
docker build -t max-recipes .Customize the build to reduce container size or select specific MAX versions:
Selects the base MAX image (default: universal):
| Value | Image | Description |
|---|---|---|
universal |
modular/max-full |
Larger image supporting all GPU types (NVIDIA, AMD) |
nvidia |
modular/max-nvidia-full |
Smaller NVIDIA-specific image |
amd |
modular/max-amd |
Smaller AMD-specific image |
Selects the MAX version (default: latest):
| Value | Description |
|---|---|
latest |
Latest stable release |
nightly |
Nightly development builds |
AMD-specific container:
docker build --build-arg MAX_GPU=amd -t max-recipes:amd .NVIDIA-specific container with nightly builds:
docker build --build-arg MAX_GPU=nvidia --build-arg MAX_TAG=nightly -t max-recipes:nvidia-nightly .docker run --gpus all \
-v ~/.cache/huggingface:/root/.cache/huggingface \
-e "HF_HUB_ENABLE_HF_TRANSFER=1" \
-e "HF_TOKEN=your-huggingface-token" \
-e "MAX_MODEL=mistral-community/pixtral-12b" \
-p 8000:8000 \
-p 8010:8010 \
max-recipesdocker run \
--group-add keep-groups \
--device /dev/kfd \
--device /dev/dri \
-v ~/.cache/huggingface:/root/.cache/huggingface \
-e "HF_HUB_ENABLE_HF_TRANSFER=1" \
-e "HF_TOKEN=your-huggingface-token" \
-e "MAX_MODEL=mistral-community/pixtral-12b" \
-p 8000:8000 \
-p 8010:8010 \
max-recipes| Variable | Required | Description | Default |
|---|---|---|---|
MAX_MODEL |
Yes | HuggingFace model to serve | - |
HF_TOKEN |
Yes* | HuggingFace API token | - |
HF_HUB_ENABLE_HF_TRANSFER |
No | Enable faster downloads | 1 |
* Required for gated models or private repositories
| Port | Service | Description |
|---|---|---|
8000 |
MAX Serve | OpenAI-compatible LLM API endpoint |
8010 |
Web App | FastAPI backend + React frontend |
HuggingFace cache (recommended):
-v ~/.cache/huggingface:/root/.cache/huggingfaceCaches downloaded models between container restarts, significantly speeding up subsequent launches.
PM2 manages service startup order (see ecosystem.config.js):
- MAX serving starts on port 8000
- Web app waits for MAX health check, then starts on port 8010
All services restart automatically if they crash.
The cookbook works with any model supported by MAX. Popular choices:
mistral-community/pixtral-12bOpenGVLab/InternVL3-14B-Instructmeta-llama/Llama-3.2-11B-Vision-Instruct
google/gemma-3-27b-itmeta-llama/Llama-3.1-8B-Instructmistralai/Mistral-Small-24B-Instruct-2501
See MAX Builds for the full list of supported models.
Once the container is running:
- Wait for startup - Watch logs for both services to start (MAX → web app)
- Open the cookbook - Navigate to http://localhost:8010
- Select endpoint - The cookbook auto-detects
http://localhost:8000as available - Choose model - Select from models detected at the MAX endpoint
Pass additional arguments to max serve:
docker run --gpus all \
-v ~/.cache/huggingface:/root/.cache/huggingface \
-e "HF_TOKEN=your-token" \
-e "MAX_MODEL=mistral-community/pixtral-12b" \
-e "MAX_ARGS=--max-batch-size 32 --max-cache-size 8192" \
-p 8000:8000 -p 8010:8010 \
max-recipesRun the container in the background:
docker run -d \
--name max-recipes \
--gpus all \
-v ~/.cache/huggingface:/root/.cache/huggingface \
-e "HF_TOKEN=your-token" \
-e "MAX_MODEL=mistral-community/pixtral-12b" \
-p 8000:8000 -p 8010:8010 \
max-recipesView logs:
docker logs -f max-recipesStop container:
docker stop max-recipesTo serve multiple models, run separate MAX containers on different ports and configure via .env.local:
# Model 1 on port 8000
docker run -d --name max-model-1 --gpus=1 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
-v ~/.cache/max_cache:/opt/venv/share/max/.max_cache \
--env "HF_TOKEN=${HF_TOKEN}" \
-p 8000:8000 \
modular/max-full:latest \
--model-path google/gemma-3-27b-it
# Model 2 on port 8002
docker run -d --name max-model-2 --gpus=1 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
-v ~/.cache/max_cache:/opt/venv/share/max/.max_cache \
--env "HF_TOKEN=${HF_TOKEN}" \
-p 8002:8000 \
modular/max-full:latest \
--model-path mistral-community/pixtral-12bConfigure in backend/.env.local:
COOKBOOK_ENDPOINTS='[
{"id": "gemma", "baseUrl": "http://localhost:8000/v1", "apiKey": "EMPTY"},
{"id": "pixtral", "baseUrl": "http://localhost:8002/v1", "apiKey": "EMPTY"}
]'Check GPU access:
# NVIDIA
docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi
# AMD
docker run --rm --device /dev/kfd --device /dev/dri rocm/rocm-terminal rocm-smiCheck Docker resource limits:
- Ensure sufficient memory allocation (8GB+ recommended)
- Verify GPU is not in use by another process
Check service logs:
docker logs max-recipes
# Look for PM2 startup messages and any errorsVerify HuggingFace token:
- Check token has read access to the model
- For gated models, ensure you've accepted the license agreement
Check disk space:
- Models can be 10GB+ in size
- Verify sufficient space in Docker volumes
Check port conflicts:
# Verify ports are not in use
lsof -i :8010
lsof -i :8000Check container logs:
docker logs max-recipes
# Look for PM2 errors or service crashesOptimize for your GPU:
- Use GPU-specific images (
--build-arg MAX_GPU=nvidiaorMAX_GPU=amd) - Adjust batch size and cache size via
MAX_ARGS - Ensure GPU drivers are up to date
Enable HuggingFace transfer acceleration:
-e "HF_HUB_ENABLE_HF_TRANSFER=1"- API keys: Never commit
.envfiles with real tokens - Network exposure: Consider using a reverse proxy for production
- GPU isolation: Use Docker resource limits to prevent GPU exhaustion
- Model access: Validate model licenses for your use case
- API Reference - Complete endpoint specifications and request/response formats
- Contributing Guide - Add your own recipes
- MAX Documentation - Learn more about MAX
- Project Context - Comprehensive architecture reference