Workaround custom_all_reduce issue (By Chengzhe Fan @zx3xyy) by zjing14 · Pull Request #3290 · ROCm/aiter

zjing14 · 2026-05-20T21:25:50Z

Note: the issue is identified and the fix is implemented by Chengzhe Fan @zx3xyy

Issue: Crash with custom all-reduce + CUDA graph enabled (ROCm / Aiter)

Root Cause Hypothesis: In Aiter's ROCm registered-input CUDA graph custom all-reduce path, a GPU can signal "ready" and peers begin reading the output buffer before the producer kernel's memory writes are globally visible across the PCIe/xGMI fabric.

Workaround: Insert a system-scope memory fence (e.g., __threadfence_system() or equivalent ROCm barrier) before the readiness signal, ensuring all prior stores from producer kernels within the CUDA graph are visible to peer GPUs before any consumer on another GPU reads the buffer.

github-actions · 2026-05-20T21:26:37Z

🏷️ CI Guide

Runs automatically on every PR:

✅ Pre-checks (submodule verification, code formatting)
✅ Aiter op tests (gfx942 + gfx950)
✅ Triton tests on MI35X (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label	Tests
`ci:triton-300x`	Run an additional Triton test job on MI300X in PRs; main branch always runs both MI35X and MI300X
`ci:sglang`	SGLang integration tests: DeepSeek-R1-MXFP4 accuracy, Qwen 3.5 accuracy
`ci:atom`	ATOM benchmark: DeepSeek-R1-0528, GPT-OSS-120B
`ci:atom_full`	ATOM accuracy suite for PR and main models from ATOM `models_accuracy.json`
`ci:vllm`	vLLM benchmark: GPT-OSS-120B, DeepSeek-R1-0528, Kimi-K2.5
`ci:all`	All standard extended tests (excludes `ci:atom_full`)

Only add ci:atom_full for FlyDSL or Triton upgrades.
Add labels via the sidebar or gh pr edit 3290 --add-label <label>

zjing14 · 2026-05-23T15:31:14Z

@HaiShaw @carlushuang

fixed custom_all_reduce

e7c4b8f

zjing14 requested a review from a team May 20, 2026 21:25

zjing14 changed the title ~~fixed custom_all_reduce~~ workaround custom_all_reduce issue May 20, 2026

zjing14 marked this pull request as draft May 20, 2026 21:27

zjing14 marked this pull request as ready for review May 23, 2026 15:30

valarLip requested a review from TennyWang1223 May 23, 2026 15:38

zjing14 changed the title ~~workaround custom_all_reduce issue~~ Workaround custom_all_reduce issue (By Chengzhe Fan @zx3xyy) May 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workaround custom_all_reduce issue (By Chengzhe Fan @zx3xyy)#3290

Workaround custom_all_reduce issue (By Chengzhe Fan @zx3xyy)#3290
zjing14 wants to merge 1 commit into
ROCm:mainfrom
zjing14:jizhan/aiter_custom_all_reduce_fix

zjing14 commented May 20, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

zjing14 commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

zjing14 commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 20, 2026

🏷️ CI Guide

Uh oh!

zjing14 commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

zjing14 commented May 20, 2026 •

edited

Loading