Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,6 @@ build-*/
benchmark_results.csv
__pycache__/
*.pyc

# Tuner artifacts (run_tuner.py)
fusilli_tuning_*/
47 changes: 47 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -278,6 +278,53 @@ python benchmarks/run_benchmark.py \
-f commands.txt -o results.csv
```

### Tuner

The Fusilli tuner (`benchmarks/run_tuner.py`) generates optimized IREE tuning
specs for Fusilli operations. It wraps the
[IREE tuner](https://github.com/nod-ai/amd-shark-ai/tree/main/amdsharktuner) to automatically generate,
compile, and benchmark tuning candidates.

**AMDGPU only.** The tuner targets ROCm dispatches and requires a build
configured with `-DFUSILLI_SYSTEMS_AMDGPU=ON`, an AMD GPU at runtime, and
amdsharktuner installed from source. The PyPI release lags and isn't compatible
with the IREE RC pinned in `version.json`, so install from GitHub directly:

```shell
pip install --pre \
"amdsharktuner @ git+https://github.com/nod-ai/amd-shark-ai.git@main#subdirectory=amdsharktuner" \
--find-links https://iree.dev/pip-release-links.html
```

**Single operation:**
```shell
python benchmarks/run_tuner.py \
--devices hip://0 \
--num-candidates 30 \
--output-td-spec tuning_spec.mlir \
--fusilli-args "matmul -M 1024 -N 1024 -K 1024 --a_type bf16 --b_type bf16 --out_type bf16"
```

**Multiple operations from file:**
```shell
python benchmarks/run_tuner.py \
--devices hip://0 \
--num-candidates 30 \
--output-td-spec tuning_spec.mlir \
--commands-file commands.txt
```

When tuning multiple commands, the best spec from each command is automatically
chained as the starting spec for the next command. To start from an existing
spec, use `--starter-td-spec <path>`.

The generated tuning spec can then be used with the benchmark driver:
```shell
FUSILLI_EXTRA_COMPILER_FLAGS="--iree-codegen-tuning-spec-path=tuning_spec.mlir" \
build/bin/benchmarks/fusilli_benchmark_driver --iter 100 \
matmul -M 1024 -N 1024 -K 1024 --a_type bf16 --b_type bf16 --out_type bf16
```

### Sanitizers

Fusilli supports building with the following sanitizers:
Expand Down
27 changes: 27 additions & 0 deletions benchmarks/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,33 @@ if(FUSILLI_SYSTEMS_AMDGPU)
ENVIRONMENT "${FUSILLI_SANITIZER_TEST_ENV_VARS}"
)
endif()

# Test tuner runner (GPU integration tests)
add_test(
NAME fusilli_tuner_runner_tests
COMMAND ${CMAKE_CURRENT_SOURCE_DIR}/test_tuner_runner.sh
${CMAKE_CURRENT_SOURCE_DIR}/run_tuner.py
$<TARGET_FILE:fusilli_benchmark_driver>
WORKING_DIRECTORY ${CMAKE_BINARY_DIR}
)

# Configure sanitizer options
if(FUSILLI_SANITIZER_TEST_ENV_VARS)
set_tests_properties(
fusilli_tuner_runner_tests PROPERTIES
ENVIRONMENT "${FUSILLI_SANITIZER_TEST_ENV_VARS}"
)
endif()
endif()

# Tuner cache extraction unit tests (CPU-only, no GPU or amdsharktuner needed)
if(FUSILLI_BUILD_TESTS)
add_test(
NAME fusilli_tuner_cache_tests
COMMAND python3 -m unittest
${CMAKE_CURRENT_SOURCE_DIR}/test_tuner_cache.py -v
WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}
)
endif()

# Add some benchmark configurations for CI coverage.
Expand Down
Loading
Loading