Summary
When a kernel has many load_register operations from an unrolled Python loop, neuronx-cc's register allocator fails with NCC_IGCA094 even though each register has a very short live range (defined, used once, then dead).
Error Message
[INTERNAL_ERROR] [NCC_IGCA094] Too many concurrent live registers - spill/reload not supported yet
Details
- Kernel has ~16,384
load_register operations (from unrolled loop over positions)
- Each register is used exactly once (for
dma_copy with indirect addressing via nb.ds())
- Graph coloring should be able to reuse registers since live ranges don't overlap
- Hardware has 62 registers, but allocator sees 16,384 logical registers
Kernel Pattern
for p in range(actual_batch_size): # 64 iterations, unrolled at trace time
target_col = nisa.load_register(batch_targets[p : p + 1, :]) # def
nisa.dma_copy(
dst=batch_target_logits[p : p + 1, :],
src=logits_hbm[..., nb.ds(target_col) : nb.ds(target_col) + 1], # use, then dead
)
BIR Structure
Each register allocation shows:
{
"Skind": "register",
"allocated": false,
"engine": "ALL",
"name": "reg_0",
"num_physical_regs": 1,
"reg_id": -1
}
Instructions have "dependencies": [] - no explicit def-use chains visible for registers.
Expected Behavior
Register allocator should recognize that:
- Each register has a live range of exactly 2 instructions (load_register → dma_copy)
- Live ranges are sequential and non-overlapping
- Same 1-2 hardware registers can be reused for all logical registers
Environment
- neuronx-cc version: 2.0.229985.0a0+1ed7a98d
- Target: trn2
- Framework: BIR (from NKI kernel builder)
Workaround
Currently none. The kernel requires indirect addressing for gathering target logits.
Summary
When a kernel has many
load_registeroperations from an unrolled Python loop, neuronx-cc's register allocator fails with NCC_IGCA094 even though each register has a very short live range (defined, used once, then dead).Error Message
Details
load_registeroperations (from unrolled loop over positions)dma_copywith indirect addressing vianb.ds())Kernel Pattern
BIR Structure
Each register allocation shows:
{ "Skind": "register", "allocated": false, "engine": "ALL", "name": "reg_0", "num_physical_regs": 1, "reg_id": -1 }Instructions have
"dependencies": []- no explicit def-use chains visible for registers.Expected Behavior
Register allocator should recognize that:
Environment
Workaround
Currently none. The kernel requires indirect addressing for gathering target logits.