From dab683a909d9533328e6254ef39f6b6caf40efb7 Mon Sep 17 00:00:00 2001 From: Fabian Mora Date: Fri, 20 Feb 2026 09:22:44 -0500 Subject: [PATCH] [aster] Begin addings docs for reg-alloc Signed-off-by: Fabian Mora --- docs/register_allocation.md | 313 ++++++++++++++++++++++++++++++++++++ 1 file changed, 313 insertions(+) create mode 100644 docs/register_allocation.md diff --git a/docs/register_allocation.md b/docs/register_allocation.md new file mode 100644 index 00000000..4e7786fb --- /dev/null +++ b/docs/register_allocation.md @@ -0,0 +1,313 @@ +# AMDGCN Register Allocation + +This document describes the AMDGCN register allocation pipeline and its constituent passes: bufferization, to-register semantics, and register coloring. It lists pre-conditions and results for each step and includes sample IR from the tests. + +## Pipeline Overview + +The **`amdgcn-reg-alloc`** pipeline performs register allocation for AMDGCN kernels by running the following passes in sequence: + +1. **Bufferization** (`aster-amdgcn-bufferization`) – inserts copies to remove potentially clobbered values and eliminates phi-nodes with register value semantics. +2. **ToRegisterSemantics** (`amdgcn-to-register-semantics`) – converts value allocas to unallocated register semantics and propagates them through the IR. +3. **RegisterDCE** (`amdgcn-register-dce`) – dead code elimination for register copy/mov operations. +4. **RegisterColoring** (`amdgcn-register-coloring`) – assigns physical registers using liveness and interference. + +**Important:** Both bufferization and to-register semantics expect IR in **value semantics**. Register coloring expects IR in **register semantics** (after to-register semantics and register DCE). + +--- + +## 1. Bufferization (`aster-amdgcn-bufferization`) + +### Summary + +Inserts phi-breaking copies to prepare for register allocation. It can be viewed as a “bufferization” pass that moves the IR away from DPS/value semantics toward side-effects before register allocation. + +### Pre-conditions + +- **IR in value semantics**: Kernels use `alloca` with register types (`!amdgcn.vgpr`, `!amdgcn.sgpr`) and SSA value semantics. +- Block arguments may carry register-typed values (phi-nodes). +- Operations may produce and consume values that flow through phis or may clobber allocas. + +### Results + +- **Phi-nodes removed**: Block arguments with register value semantics are eliminated by inserting allocations at dominator points and copying values into them; branch operands for those phis are removed. +- **Clobber copies inserted**: When a value may be clobbered (e.g., same alloca written twice, or value used across blocks after another write), the pass inserts `lsir.copy` and fresh allocas so each “version” has its own storage. +- **Still value semantics**: Allocas and copies remain in value-semantic form; no physical register numbers yet. + +### Sample: Diamond CFG (phi-breaking) + +**Before** (value semantics, phi at merge): + +```mlir +amdgcn.module @bufferization_phi_copies_1 target = isa = { + func.func private @rand() -> i1 + kernel @bufferization_phi_copies_1 { + %0 = func.call @rand() : () -> i1 + %1 = alloca : !amdgcn.vgpr + %2 = alloca : !amdgcn.vgpr + cf.cond_br %0, ^bb1, ^bb2 + ^bb1: + cf.br ^bb3(%1 : !amdgcn.vgpr) + ^bb2: + cf.br ^bb3(%2 : !amdgcn.vgpr) + ^bb3(%3: !amdgcn.vgpr): // 2 preds: ^bb1, ^bb2 + %4 = test_inst outs %3 : (!amdgcn.vgpr) -> !amdgcn.vgpr + end_kernel + } +} +``` + +**After** (phi removed, copies and merge alloca inserted): + +```mlir +// CHECK: %[[VAL_0:.*]] = alloca : !amdgcn.vgpr +// CHECK: %[[VAL_1:.*]] = alloca : !amdgcn.vgpr +// CHECK: %[[VAL_2:.*]] = alloca : !amdgcn.vgpr +// CHECK: cf.cond_br %[[CALL_0]], ^bb1, ^bb2 +// CHECK: ^bb1: +// CHECK: %[[VAL_3:.*]] = alloca : !amdgcn.vgpr +// CHECK: %[[COPY_0:.*]] = lsir.copy %[[VAL_3]], %[[VAL_0]] : !amdgcn.vgpr, !amdgcn.vgpr +// CHECK: lsir.copy %[[VAL_2]], %[[COPY_0]] : !amdgcn.vgpr, !amdgcn.vgpr +// CHECK: cf.br ^bb3 +// CHECK: ^bb2: +// CHECK: %[[VAL_4:.*]] = alloca : !amdgcn.vgpr +// CHECK: %[[COPY_2:.*]] = lsir.copy %[[VAL_4]], %[[VAL_1]] : !amdgcn.vgpr, !amdgcn.vgpr +// CHECK: lsir.copy %[[VAL_2]], %[[COPY_2]] : !amdgcn.vgpr, !amdgcn.vgpr +// CHECK: cf.br ^bb3 +// CHECK: ^bb3: +// CHECK: %[[VAL_5:.*]] = dealloc_cast %[[VAL_2]] : !amdgcn.vgpr +// CHECK: %[[VAL_6:.*]] = test_inst outs %[[VAL_5]] : (!amdgcn.vgpr) -> !amdgcn.vgpr +``` + +### Sample: Cross-block clobber + +When the same alloca is written twice and the first result is used in a successor block, bufferization inserts a copy so that use sees the correct value: + +**Before:** + +```mlir +%v1 = test_inst outs %0 ins %1 : (!amdgcn.vgpr, !amdgcn.sgpr) -> !amdgcn.vgpr +%v2 = test_inst outs %0 ins %1 : (!amdgcn.vgpr, !amdgcn.sgpr) -> !amdgcn.vgpr +cf.cond_br %cond, ^bb1, ^bb2 +^bb1: + test_inst ins %v1 : (!amdgcn.vgpr) -> () +^bb2: + test_inst ins %v2 : (!amdgcn.vgpr) -> () +``` + +**After:** A copy is inserted so the first write’s value is preserved for `%v1`: + +```mlir +// CHECK: %[[COPY_A:.*]] = alloca : !amdgcn.vgpr +// CHECK: %[[COPY:.*]] = lsir.copy %[[COPY_A]], %[[A]] : !amdgcn.vgpr, !amdgcn.vgpr +// CHECK: %[[V1:.*]] = test_inst outs %[[A]] ins %[[S]] +// CHECK: %[[V2:.*]] = test_inst outs %[[COPY]] ins %[[S]] +``` + +--- + +## 2. To-Register Semantics (`amdgcn-to-register-semantics`) + +### Summary + +Propagates **non-value** (unallocated) register semantics through the IR. Converts value allocas to unallocated allocas and ensures instruction results are not used as SSA values by other operations (values flow through side-effects on allocas). + +### Pre-conditions + +- **IR in value semantics**: Same as bufferization output — value allocas, `lsir.copy`, possibly `dealloc_cast`, block arguments already eliminated for register types by bufferization. +- Runs **after** bufferization, **before** register coloring. + +### Results + +- **Value allocas → unallocated allocas**: Types change from e.g. `!amdgcn.vgpr` to `!amdgcn.vgpr` (unallocated). +- **No SSA use of instruction results**: After the pass, instruction results are not used as operands of other operations; all uses go through allocas (register semantics). +- **`amdgcn.alloca`**: Value `alloca` may be represented as `amdgcn.alloca` with `!amdgcn.vgpr` (or similar) in the tests; the key is unallocated semantics and use only through side-effects. + +### Sample: Value allocas to unallocated + +**Before** (value semantics, results used as SSA values): + +```mlir +func.func @range_amdgcn.allocations() { + %0 = amdgcn.alloca : !amdgcn.vgpr + %1 = amdgcn.alloca : !amdgcn.vgpr + // ... + %7 = lsir.copy %6, %0 : !amdgcn.vgpr, !amdgcn.vgpr + %8 = amdgcn.test_inst outs %0 : (!amdgcn.vgpr) -> !amdgcn.vgpr + // ... + amdgcn.test_inst ins %8, %11 : (!amdgcn.vgpr, !amdgcn.vgpr) -> () + func.return +} +``` + +**After** (unallocated semantics, `?` in type): + +```mlir +// CHECK-DAG: %[[ALLOCA_0:.*]] = amdgcn.alloca : !amdgcn.vgpr +// CHECK-DAG: %[[ALLOCA_1:.*]] = amdgcn.alloca : !amdgcn.vgpr +// ... +// CHECK: lsir.copy %[[ALLOCA_6]], %[[ALLOCA_0]] : !amdgcn.vgpr, !amdgcn.vgpr +// CHECK: amdgcn.test_inst outs %[[ALLOCA_0]] : (!amdgcn.vgpr) -> () +// ... +// CHECK: amdgcn.test_inst ins %[[ALLOCA_0]], %[[ALLOCA_1]] : (!amdgcn.vgpr, !amdgcn.vgpr) -> () +``` + +### Sample: Existing physical registers preserved + +When the input already has fixed register indices (e.g. `!amdgcn.vgpr<0>`, `!amdgcn.sgpr<2>`), to-register semantics preserves them and only converts unallocated slots to `?`: + +```mlir +// Input +%0 = amdgcn.alloca : !amdgcn.vgpr<0> +%2 = amdgcn.alloca : !amdgcn.sgpr<0> +// ... +// Output: same fixed indices, unallocated ones become +// CHECK-DAG: %[[ALLOCA_0:.*]] = amdgcn.alloca : !amdgcn.vgpr<0> +// CHECK-DAG: %[[ALLOCA_2:.*]] = amdgcn.alloca : !amdgcn.sgpr<0> +``` + +--- + +## 3. Register DCE (`amdgcn-register-dce`) + +### Summary + +Dead code elimination for register copy/mov operations: removes `lsir.copy` and AMDGCN mov instructions whose destination register is not live after the operation. + +### Pre-conditions + +- IR in **register semantics** (after to-register semantics): allocas with unallocated semantics, no SSA use of instruction results for register values. + +### Results + +- Redundant or dead copies and movs are removed, reducing register pressure and code size before coloring. + +--- + +## 4. Register Coloring (`amdgcn-register-coloring`) + +### Summary + +Performs register allocation by building liveness, building an interference graph, and assigning physical register indices to each alloca. Supports optional “full” graph build mode via pipeline option `--amdgcn-reg-alloc=mode=full`. + +### Pre-conditions + +- **IR in register semantics**: All register storage is in the form of allocas (or equivalent) with **unallocated** semantics (e.g. `!amdgcn.vgpr`, `!amdgcn.sgpr`). +- No block arguments carrying register types (already removed by bufferization). +- Runs **after** bufferization, to-register semantics, and register DCE. + +### Results + +- **Allocas get physical registers**: Types change from `!amdgcn.vgpr` to `!amdgcn.vgpr` (and similarly for SGPR/AGPR). +- **`lsir.copy` lowered**: Copies are lowered to machine movs (e.g. `amdgcn.vop1.vop1 `, `amdgcn.sop1 s_mov_b32`). +- **Register ranges**: `make_register_range` operands get concrete indices; the IR is in allocated form suitable for later codegen/legalization. + +### Sample: Allocas colored to fixed indices + +**Before** (register semantics, unallocated): + +```mlir +amdgcn.kernel @range_allocations { + %0 = alloca : !amdgcn.vgpr + %1 = alloca : !amdgcn.vgpr + // ... + test_inst outs %0 : (!amdgcn.vgpr) -> () + test_inst ins %0, %1 : (!amdgcn.vgpr, !amdgcn.vgpr) -> () + end_kernel +} +``` + +**After** (physical registers): + +```mlir +// CHECK-DAG: %[[VAL_0:.*]] = alloca : !amdgcn.vgpr<0> +// CHECK-DAG: %[[VAL_1:.*]] = alloca : !amdgcn.vgpr<1> +// ... +// CHECK: test_inst outs %[[VAL_0]] : (!amdgcn.vgpr<0>) -> () +// CHECK: test_inst ins %[[VAL_0]], %[[VAL_1]] : (!amdgcn.vgpr<0>, !amdgcn.vgpr<1>) -> () +``` + +### Sample: Copies lowered to movs + +**Before:** + +```mlir +lsir.copy %0, %1 : !amdgcn.vgpr, !amdgcn.vgpr +lsir.copy %2, %3 : !amdgcn.sgpr, !amdgcn.sgpr +``` + +**After:** + +```mlir +// CHECK: amdgcn.vop1.vop1 %[[ALLOCA_0]], %[[ALLOCA_1]] : (!amdgcn.vgpr<0>, !amdgcn.vgpr<1>) -> () +// CHECK: amdgcn.sop1 s_mov_b32 outs %[[ALLOCA_2]] ins %[[ALLOCA_3]] : !amdgcn.sgpr<0>, !amdgcn.sgpr<1> +``` + +### Sample: Full pipeline (reg-alloc) + +**Input** (value semantics, phi): + +```mlir +amdgcn.module @reg_alloc target = isa = { + func.func private @rand() -> i1 + kernel @reg_alloc { + %0 = func.call @rand() : () -> i1 + %1 = alloca : !amdgcn.vgpr + %2 = alloca : !amdgcn.vgpr + cf.cond_br %0, ^bb1, ^bb2 + ^bb1: + cf.br ^bb3(%1 : !amdgcn.vgpr) + ^bb2: + cf.br ^bb3(%2 : !amdgcn.vgpr) + ^bb3(%3: !amdgcn.vgpr): + %4 = test_inst outs %3 : (!amdgcn.vgpr) -> !amdgcn.vgpr + end_kernel + } +} +``` + +**After `--amdgcn-reg-alloc`**: Phi and block args gone; allocas colored (e.g. `!amdgcn.vgpr<0>`); no unallocated `!amdgcn.vgpr<1>` in minimal mode: + +```mlir +// CHECK-NOT: alloca : !amdgcn.vgpr<1> +// CHECK: alloca : !amdgcn.vgpr<0> +``` + +--- + +## Summary Table + +| Step | Pre-condition | Result | +|----------------------|----------------------|--------| +| **Bufferization** | Value semantics IR | No phis for registers; clobber copies inserted; still value semantics | +| **ToRegisterSemantics** | Value semantics IR (after bufferization) | Unallocated allocas (`?`); register semantics; no SSA use of inst results | +| **RegisterDCE** | Register semantics | Dead copies/movs removed | +| **RegisterColoring** | Register semantics (unallocated) | Physical registers assigned; copies lowered to movs | + +--- + +## Running the pipeline + +```bash +aster-opt input.mlir --amdgcn-reg-alloc -o output.mlir +``` + +With full interference graph build mode: + +```bash +aster-opt input.mlir --amdgcn-reg-alloc=mode=full -o output.mlir +``` + +Individual passes (for debugging or custom pipelines): + +```bash +aster-opt input.mlir --aster-amdgcn-bufferization --split-input-file | FileCheck %s +aster-opt input.mlir --amdgcn-to-register-semantics --split-input-file | FileCheck %s +aster-opt input.mlir --amdgcn-register-coloring --cse --split-input-file | FileCheck %s +``` + +Test locations: + +- Bufferization: `aster/test/Dialect/AMDGCN/Transforms/bufferization.mlir` +- To-register semantics: `aster/test/Dialect/AMDGCN/Transforms/to-register-semantics.mlir` +- Register coloring: `aster/test/Dialect/AMDGCN/Transforms/register-coloring.mlir` +- Full pipeline: `aster/test/Dialect/AMDGCN/Transforms/reg-alloc.mlir`, `chained-select-dps-violation.mlir`