This guide helps you convert an existing docker-compose.yaml into an hpc-compose spec for Slurm clusters using Pyxis/Enroot, Apptainer, Singularity, or host runtimes.
| Docker Compose feature | hpc-compose equivalent |
|---|---|
image |
image (same syntax, auto-prefixed with docker://) |
command |
command (string or list, same syntax) |
entrypoint |
entrypoint (string or list, same syntax) |
environment |
environment (map or list, same syntax) |
volumes |
volumes (host:container bind mounts, same syntax) |
depends_on |
depends_on (list or map with condition: service_started / service_healthy) |
working_dir |
working_dir (requires explicit command or entrypoint) |
build |
Not supported. Use image + x-runtime.prepare.commands instead. |
ports |
Not supported. Use host networking semantics instead. 127.0.0.1 works only when both sides run on the same node. |
networks / network_mode |
Not supported. There is no Docker-style overlay network or service-name DNS layer. |
restart |
Not supported as a Compose key. Use services.<name>.x-slurm.failure_policy. |
deploy |
Not supported. Use x-slurm for resource allocation. |
healthcheck |
Supported for a constrained TCP/HTTP subset and normalized into readiness; use explicit readiness for anything more complex. |
Resource limits (cpus, mem_limit) |
Use x-slurm.cpus_per_task, x-slurm.mem, x-slurm.gpus |
version: "3.9"
services:
redis:
image: redis:7
ports:
- "6379:6379"
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 5s
timeout: 3s
retries: 5
app:
build: .
ports:
- "8000:8000"
depends_on:
redis:
condition: service_healthy
environment:
REDIS_HOST: redis
volumes:
- ./app:/workspace
working_dir: /workspace
command: python -m mainname: my-app
x-slurm:
job_name: my-app
time: "01:00:00"
mem: 8G
cpus_per_task: 4
cache_dir: /cluster/shared/hpc-compose-cache
services:
redis:
image: redis:7
command: redis-server --save "" --appendonly no
readiness:
type: tcp
host: 127.0.0.1
port: 6379
timeout_seconds: 30
app:
image: python:3.11-slim
depends_on:
redis:
condition: service_healthy
environment:
REDIS_HOST: 127.0.0.1
volumes:
- ./app:/workspace
working_dir: /workspace
command: python -m main
x-runtime:
prepare:
commands:
- pip install --no-cache-dir redis fastapi uvicornbuild: .→image: python:3.11-slim+x-runtime.prepare.commandsfor dependencies.ports→ Removed. Services communicate via127.0.0.1because they run on the same node.REDIS_HOST: redis→REDIS_HOST: 127.0.0.1. No DNS service names; use localhost.healthcheck→readinesswithtype: tcp.- Added
x-slurmblock for Slurm resource allocation (time, memory, CPUs). - Added
x-slurm.cache_dirfor shared image storage.
Docker Compose creates isolated networks where services find each other by name. In hpc-compose, helper services on the same node share the host network directly, and multi-node distributed steps must use explicit rendezvous addresses. Replace service hostnames with 127.0.0.1 only when both sides intentionally stay on one node. For multi-node runs, derive the rendezvous host from /hpc-compose/job/allocation/primary_node or HPC_COMPOSE_PRIMARY_NODE.
Docker Compose uses build: to run a Dockerfile. hpc-compose uses x-runtime.prepare.commands instead:
# Docker Compose
app:
build:
context: .
dockerfile: Dockerfile
# hpc-compose
app:
image: python:3.11-slim
x-runtime:
prepare:
commands:
- pip install --no-cache-dir -r /tmp/requirements.txt
mounts:
- ./requirements.txt:/tmp/requirements.txtPrefer volumes for fast-changing source code and x-runtime.prepare.commands for slower-changing dependencies. x-enroot.prepare remains accepted as a Pyxis/Enroot compatibility spelling, but new specs should use x-runtime.prepare.
Docker Compose uses healthcheck with a test command, interval, timeout, and retries. hpc-compose now accepts a constrained healthcheck subset and normalizes it into readiness:
# TCP: wait for a port to accept connections
readiness:
type: tcp
host: 127.0.0.1
port: 6379
timeout_seconds: 30
# Log: wait for a pattern in service output
readiness:
type: log
pattern: "Server started"
timeout_seconds: 60
# Sleep: fixed delay
readiness:
type: sleep
seconds: 5Supported healthcheck migration patterns:
["CMD", "nc", "-z", HOST, PORT]["CMD-SHELL", "nc -z HOST PORT"]- recognized
curlprobes againsthttp://orhttps://URLs - recognized
wget --spiderprobes againsthttp://orhttps://URLs
Still unsupported in v1:
- arbitrary custom command probes
intervalretriesstart_period
Docker Compose uses deploy.resources or top-level cpus/mem_limit. hpc-compose uses Slurm-native resource settings:
x-slurm:
time: "02:00:00"
mem: 32G
cpus_per_task: 8
gpus: 1
services:
app:
x-slurm:
cpus_per_task: 4
gpus: 1Docker Compose supports restart: always, on-failure, etc. hpc-compose does not accept the Compose restart: key, but it does support per-service restart behavior through services.<name>.x-slurm.failure_policy.
services:
app:
image: python:3.11-slim
x-slurm:
failure_policy:
mode: restart_on_failure
max_restarts: 3
backoff_seconds: 5
window_seconds: 60
max_restarts_in_window: 3restart_on_failure retries only on non-zero exits. It enforces both a lifetime restart cap and a rolling-window crash-loop cap during one live batch-script execution. If you omit the rolling-window fields, hpc-compose defaults to window_seconds: 60 and max_restarts_in_window: <resolved max_restarts>. Use mode: fail_job (default) for fail-fast behavior, or mode: ignore for non-critical sidecars.
Practical mapping:
- Compose
restart: "no"-> omitfailure_policyor usemode: fail_job - Compose
restart: on-failure[:N]-> usemode: restart_on_failurewithmax_restarts: Nwhen you want a similar lifetime retry budget - Compose
restart: always/unless-stopped-> no direct equivalent;hpc-composeintentionally keeps restart handling bounded within one batch job
The rolling-window fields have no direct Docker Compose equivalent. They exist to stop fast crash loops inside one Slurm allocation without giving up a larger lifetime retry budget for transient failures.
| Feature | Alternative |
|---|---|
build |
Use image + x-runtime.prepare.commands. Mount build context files with x-runtime.prepare.mounts if needed. |
ports |
Not needed. Services share 127.0.0.1 on one node. |
networks / network_mode |
Not needed. All services are on the same host network. |
restart |
Use services.<name>.x-slurm.failure_policy (fail_job, ignore, restart_on_failure). |
deploy |
Use x-slurm for resources. |
| Service DNS names | Use 127.0.0.1 for same-node helpers, or explicit host metadata such as HPC_COMPOSE_PRIMARY_NODE for distributed runs. |
| Named volumes | Use host-path bind mounts in volumes. |
.env file |
Supported. .env in the compose file directory is loaded automatically. |
- Remove
build:— Replace withimage:pointing to a base image. Move dependency installation tox-runtime.prepare.commands. - Remove
ports:— Use host-network semantics instead of container port publishing. - Remove
networks:/network_mode:— There is no Docker-style overlay network or service-name DNS layer. - Remove Compose
restart:— useservices.<name>.x-slurm.failure_policywhen you need per-service restart behavior. - Remove
deploy:— Usex-slurmfor resource allocation. - Replace service hostnames — Change any service-name references (e.g.
redis,postgres) to127.0.0.1for same-node helpers, or to explicit allocation metadata for distributed runs. - Replace
healthcheck:— Convert toreadiness:withtype: tcp,type: log, ortype: sleep. - Add
x-slurm:— Settime,mem,cpus_per_task, and optionallygpus,partition,account. - Set
cache_dir— Pointx-slurm.cache_dirto shared storage visible from login and compute nodes. - Validate — Run
hpc-compose validate -f compose.yamlto check the converted spec. - Inspect — Run
hpc-compose inspect --verbose -f compose.yamlto confirm the planner understood your intent.