This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
# Install in development mode with all dependencies
pip install -e ".[dev]"
# Optional: install Kubernetes support
pip install -e ".[all]"
# Setup pre-commit hooks
pre-commit install# Run all tests
pytest
# Run specific test file
pytest tests/unit/test_error_handling.py -v
# Run specific test class or function
pytest tests/unit/test_error_handling.py::TestErrorPatternMatching -v
# Run tests with coverage
pytest --cov=src/madengine --cov-report=html
# Skip slow tests
pytest -m "not slow"
# Format code
black src/ tests/
isort src/ tests/
# Lint
flake8 src/ tests/
# Type check
mypy src/madengine
# Run all pre-commit checks
pre-commit run --all-filesmadengine is a CLI tool for running AI/ML models in local Docker, Kubernetes, and SLURM environments. The entry point is madengine.cli.app:cli_main (registered as the madengine console script).
CLI Layer (src/madengine/cli/)
app.py— Typer app wiring, registers 5 commands:discover,build,run,report,databasecommands/— One file per command (build, run, discover, report, database)constants.py—ExitCodeenum (SUCCESS=0,FAILURE=1,BUILD_FAILURE=2,RUN_FAILURE=3,INVALID_ARGS=4)
Orchestration Layer (src/madengine/orchestration/)
build_orchestrator.py—BuildOrchestrator: discovers models, builds Docker images, writesbuild_manifest.jsonrun_orchestrator.py—RunOrchestrator: reads or triggers builds, infers deployment target, delegates to local or distributed execution
Core Layer (src/madengine/core/)
context.py—Contextclass: mergesadditional_contextwith system detection (GPU vendor, architecture, OS, ROCm path). Usesast.literal_eval()to parse additional_context strings (notjson.loads— pass Python dict repr, not JSON)console.py—Console: shell execution wrapper with live output supportdocker.py— Docker command wrapper
Execution Layer (src/madengine/execution/)
container_runner.py—ContainerRunner: runs models from manifest viadocker run, writes results toperf.csvdocker_builder.py—DockerBuilder: builds images from Dockerfilescontainer_runner_helpers.py— Log error pattern scanning, timeout resolution
Deployment Layer (src/madengine/deployment/)
factory.py—DeploymentFactory: Factory pattern, registersSlurmDeploymentandKubernetesDeploymentbase.py—BaseDeploymentabstract class,DeploymentConfigdataclasskubernetes.py/slurm.py— Concrete deployments; target is inferred by Convention over Configuration: presence of"k8s"or"kubernetes"key → K8s;"slurm"key → SLURM; neither → localpresets/— JSON preset files for K8s/SLURM default configurations; auto-merged with minimal user configsconfig_loader.py— Loads and merges preset JSON with user-supplied config
Utils (src/madengine/utils/)
discover_models.py—DiscoverModels: three discovery methods: rootmodels.json,scripts/{dir}/models.json, orscripts/{dir}/get_models_json.py(dynamic)gpu_tool_factory.py/gpu_tool_manager.py— GPU vendor abstraction (AMD/NVIDIA)gpu_validator.py— ROCm installation detection, GPU vendor detectionconfig_parser.py—ConfigParser: parses--additional-contextand tools config
Reporting (src/madengine/reporting/)
update_perf_csv.py— Writes/appends toperf.csvandperf_entry.csvcsv_to_html.py/csv_to_email.py— Report generation
-
Build flow: CLI →
BuildOrchestrator→DiscoverModels(finds models by tags) →DockerBuilder(builds images) → writesbuild_manifest.json -
Run flow: CLI →
RunOrchestrator→ loads/generatesbuild_manifest.json→ infers target →ContainerRunner(local) orDeploymentFactory(K8s/SLURM) → writesperf.csv -
additional_context: User JSON/Python-dict string merged intoContext.ctx. Context is parsed withast.literal_eval(), so values can use Python dict syntax. Keys likek8s,slurm,distributed,tools,pre_scripts,post_scriptsdrive behavior. -
Model definition: Models defined in
models.jsonwith fields:name,tags,dockerfile,scripts,n_gpus,args,timeout,skip_gpu_arch, etc. -
Script isolation: During run,
scripts/common/is populated from the madengine package (pre_scripts, post_scripts, tools) and cleaned up afterwards. The MAD project's ownscripts/anddocker/directories are preserved.
No explicit "deploy" field is needed. Target is inferred from config structure:
"k8s"or"kubernetes"key present → Kubernetes deployment"slurm"key present → SLURM deployment- Neither → local Docker execution
tests/
├── unit/ # Fast isolated tests with mocking
├── integration/ # End-to-end with real Docker/system calls
├── e2e/ # Full workflow tests
└── fixtures/ # Dummy models, scripts, and data for testing
Pytest config is in pyproject.toml under [tool.pytest.ini_options]. Test markers: slow, integration.
- Black formatting, 88-character line length
- isort with
profile = "black" - Google-style docstrings
- Type hints required for public functions
- Conventional commits:
feat:,fix:,docs:,test:,refactor:,style:,perf:,chore: