Skip to content
23 changes: 14 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,17 +59,21 @@ git clone https://github.com/ROCm/MAD.git && cd MAD
# Discover available models
madengine discover --tags dummy

# Run locally
# Run locally (full workflow: discover/build/run as configured by the model)
madengine run --tags dummy

# Or with explicit configuration
madengine run --tags dummy \
--additional-context '{"gpu_vendor": "AMD", "guest_os": "UBUNTU"}'
```

> **Note**: For build operations, `gpu_vendor` defaults to `AMD` and `guest_os` defaults to `UBUNTU` if not specified. For production deployments or non-AMD/Ubuntu environments, explicitly specify these values.

If ROCm is not installed under `/opt/rocm` (e.g. Rock or pip install), use `--rocm-path` or set `ROCM_PATH`:

```bash
madengine run --tags dummy --rocm-path /path/to/rocm \
--additional-context '{"gpu_vendor": "AMD", "guest_os": "UBUNTU"}'
# or: export ROCM_PATH=/path/to/rocm && madengine run --tags dummy ...
madengine run --tags dummy --rocm-path /path/to/rocm
# or: export ROCM_PATH=/path/to/rocm && madengine run --tags dummy
```

**Results:** Performance data is written to `perf.csv` (and optionally `perf_entry.csv`). The file is created automatically if missing. Failed runs (including pre-run setup failures) are recorded with status `FAILURE` so every attempted model appears in the table. See [Exit Codes](docs/cli-reference.md#exit-codes) for CI/script usage.
Expand All @@ -92,13 +96,14 @@ madengine provides five main commands for model automation and benchmarking:
# Discover models
madengine discover --tags dummy

# Build image
madengine build --tags dummy \
--additional-context '{"gpu_vendor": "AMD", "guest_os": "UBUNTU"}'
# Build image (uses AMD/UBUNTU defaults)
madengine build --tags dummy

# Run model
madengine run --tags dummy \
--additional-context '{"gpu_vendor": "AMD", "guest_os": "UBUNTU"}'
madengine run --tags dummy

# For non-AMD/Ubuntu environments, specify explicitly:
# madengine build --tags dummy --additional-context '{"gpu_vendor": "NVIDIA", "guest_os": "CENTOS"}'

# Generate report
madengine report to-html --csv-file perf_entry.csv
Expand Down
25 changes: 24 additions & 1 deletion docs/cli-reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,30 @@ madengine build --tags model \
madengine build --tags model --live-output --verbose
```

**Required Context for Build:**
**Default Values:**

The build command applies the following defaults if not specified:

- **gpu_vendor**: `AMD`
- **guest_os**: `UBUNTU`

Example with defaults:
```bash
# Equivalent to providing {"gpu_vendor": "AMD", "guest_os": "UBUNTU"}
madengine build --tags dummy
```

You will see a message indicating which defaults were applied:

```
ℹ️ Using default values for build configuration:
• gpu_vendor: AMD (default)
• guest_os: UBUNTU (default)

💡 To customize, use --additional-context '{"gpu_vendor": "NVIDIA", "guest_os": "CENTOS"}'
```

**Supported Values:**

- `gpu_vendor`: `"AMD"` or `"NVIDIA"`
- `guest_os`: `"UBUNTU"` or `"CENTOS"`
Expand Down
63 changes: 56 additions & 7 deletions docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,17 +26,66 @@ madengine run --tags model --additional-context-file config.json
}
```

## Basic Configuration
## Default Configuration Values

### Required for Local Execution
madengine provides sensible defaults for common AMD/Ubuntu workflows:

```json
{
"gpu_vendor": "AMD",
"guest_os": "UBUNTU"
}
| Field | Default Value | Customization |
|-------|---------------|---------------|
| `gpu_vendor` | `AMD` | Set to `NVIDIA` for NVIDIA GPUs |
| `guest_os` | `UBUNTU` | Set to `CENTOS` for CentOS containers |

### When Defaults Apply

Defaults are applied during the **build** command when fields are not explicitly provided:

```bash
# Uses defaults: {"gpu_vendor": "AMD", "guest_os": "UBUNTU"}
madengine build --tags model

# Explicit override
madengine build --tags model \
--additional-context '{"gpu_vendor": "NVIDIA", "guest_os": "CENTOS"}'
```

When defaults are applied, you'll see an informative message:

```
ℹ️ Using default values for build configuration:
• gpu_vendor: AMD (default)
• guest_os: UBUNTU (default)

💡 To customize, use --additional-context '{"gpu_vendor": "NVIDIA", "guest_os": "CENTOS"}'
```

### Partial Configuration

You can provide one field and let the other default:

```bash
# Override only gpu_vendor (guest_os defaults to UBUNTU)
madengine build --tags model \
--additional-context '{"gpu_vendor": "NVIDIA"}'

# Override only guest_os (gpu_vendor defaults to AMD)
madengine build --tags model \
--additional-context '{"guest_os": "CENTOS"}'
```

### Production Recommendations

For production deployments:
- ✅ **DO** explicitly specify all configuration values
- ✅ **DO** use configuration files for reproducibility
- ⚠️ **AVOID** relying on defaults in automated workflows

### Run Command Behavior

The **run** command does NOT require these values because it can detect GPU vendor at runtime.
Defaults only apply to the **build** command where Dockerfile selection requires them.

## Basic Configuration

**gpu_vendor** (case-insensitive):
- `"AMD"` - AMD ROCm GPUs
- `"NVIDIA"` - NVIDIA CUDA GPUs
Expand Down
18 changes: 12 additions & 6 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,11 +24,16 @@ pip install git+https://github.com/ROCm/madengine.git
# Discover models
madengine discover --tags dummy

# Run locally
# Run locally (full workflow: discover/build/run as configured by the model)
madengine run --tags dummy

# Or with explicit configuration
madengine run --tags dummy \
--additional-context '{"gpu_vendor": "AMD", "guest_os": "UBUNTU"}'
```

> **Note**: `gpu_vendor` defaults to `AMD` and `guest_os` defaults to `UBUNTU` for build operations. For production or non-AMD/Ubuntu environments, specify these values explicitly.

Results are saved to `perf_entry.csv`.

## Commands Overview
Expand All @@ -51,13 +56,14 @@ For complete command options and detailed examples, see **[CLI Command Reference
# Discover models
madengine discover --tags dummy

# Build image
madengine build --tags model \
--additional-context '{"gpu_vendor": "AMD", "guest_os": "UBUNTU"}'
# Build image (uses AMD/UBUNTU defaults)
madengine build --tags model

# Run model
madengine run --tags model \
--additional-context '{"gpu_vendor": "AMD", "guest_os": "UBUNTU"}'
madengine run --tags model

# For NVIDIA or other configurations, specify explicitly:
# madengine build --tags model --additional-context '{"gpu_vendor": "NVIDIA", "guest_os": "CENTOS"}'

# Generate HTML report
madengine report to-html --csv-file perf_entry.csv
Expand Down
21 changes: 8 additions & 13 deletions src/madengine/cli/commands/build.py
Original file line number Diff line number Diff line change
Expand Up @@ -173,16 +173,18 @@ def build(
)

try:
# Validate additional context
validate_additional_context(additional_context, additional_context_file)
# Validate additional context and merge file + CLI; defaults wired into orchestrator
validated_context = validate_additional_context(
additional_context, additional_context_file
)

# Create arguments object
args = create_args_namespace(
tags=effective_tags,
target_archs=target_archs,
registry=registry,
additional_context=additional_context,
additional_context_file=additional_context_file,
additional_context=repr(validated_context),
additional_context_file=None,
clean_docker_cache=clean_docker_cache,
manifest_output=manifest_output,
live_output=live_output,
Expand Down Expand Up @@ -221,15 +223,8 @@ def build(
# Handle batch manifest post-processing
if batch_data:
with console.status("Processing batch manifest..."):
additional_context_dict = getattr(args, "additional_context", None)
if isinstance(additional_context_dict, str):
additional_context_dict = json.loads(additional_context_dict)
guest_os = (
additional_context_dict.get("guest_os") if additional_context_dict else None
)
gpu_vendor = (
additional_context_dict.get("gpu_vendor") if additional_context_dict else None
)
guest_os = validated_context.get("guest_os")
gpu_vendor = validated_context.get("gpu_vendor")
process_batch_manifest_entries(
batch_data, manifest_output, registry, guest_os, gpu_vendor
)
Expand Down
68 changes: 14 additions & 54 deletions src/madengine/cli/commands/run.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@
Copyright (c) Advanced Micro Devices, Inc. All rights reserved.
"""

import ast
import json
import os
from typing import List, Optional
Expand Down Expand Up @@ -43,6 +42,11 @@
display_results_table,
display_performance_table,
)
from ..validators import (
additional_context_needs_cli_validation,
finalize_additional_context_dict,
merge_additional_context_from_sources,
)


def run(
Expand Down Expand Up @@ -167,33 +171,16 @@ def run(
)
raise typer.Exit(ExitCode.INVALID_ARGS)

# When both --additional-context-file and --additional-context are provided,
# load file first then overlay CLI (CLI overrides file).
# Merge file + CLI (CLI wins), then validate (same rules as `build`) when non-empty.
effective_additional_context = additional_context
effective_additional_context_file = additional_context_file
if additional_context_file and additional_context != "{}":
merged = {}
if os.path.exists(additional_context_file):
try:
with open(additional_context_file, "r") as f:
merged = json.load(f)
except json.JSONDecodeError:
console.print(
f"❌ [red]Invalid JSON format in {additional_context_file}[/red]"
)
raise typer.Exit(ExitCode.INVALID_ARGS)
try:
try:
cli_context = json.loads(additional_context)
except json.JSONDecodeError:
cli_context = ast.literal_eval(additional_context)
if isinstance(cli_context, dict):
merged.update(cli_context)
except (json.JSONDecodeError, ValueError, SyntaxError):
console.print(
f"❌ [red]Invalid additional_context format: {additional_context}[/red]"
)
raise typer.Exit(ExitCode.INVALID_ARGS)
if additional_context_needs_cli_validation(
additional_context, additional_context_file
):
merged, _ = merge_additional_context_from_sources(
additional_context, additional_context_file
)
finalize_additional_context_dict(merged)
effective_additional_context = repr(merged)
effective_additional_context_file = None

Expand Down Expand Up @@ -296,34 +283,7 @@ def run(
raise typer.Exit(ExitCode.RUN_FAILURE)

else:
# Check if MAD_CONTAINER_IMAGE is provided - this enables local image mode
additional_context_dict = {}
try:
if additional_context and additional_context != "{}":
additional_context_dict = json.loads(additional_context)
except json.JSONDecodeError:
try:
# Try parsing as Python dict literal
additional_context_dict = ast.literal_eval(additional_context)
except (ValueError, SyntaxError):
console.print(
f"❌ [red]Invalid additional_context format: {additional_context}[/red]"
)
raise typer.Exit(ExitCode.INVALID_ARGS)

# Load additional context from file if provided
if additional_context_file and os.path.exists(additional_context_file):
try:
with open(additional_context_file, 'r') as f:
file_context = json.load(f)
additional_context_dict.update(file_context)
except json.JSONDecodeError:
console.print(
f"❌ [red]Invalid JSON format in {additional_context_file}[/red]"
)
raise typer.Exit(ExitCode.INVALID_ARGS)

# MAD_CONTAINER_IMAGE handling is now done in RunOrchestrator
# MAD_CONTAINER_IMAGE handling is done in RunOrchestrator
# Full workflow (may include MAD_CONTAINER_IMAGE mode)
if manifest_file:
console.print(
Expand Down
Loading