Skip to content

add error info to indicate no sliding window for triton backward pass#3158

Open
scxiao wants to merge 1 commit into
mainfrom
scxiao/add_error_info
Open

add error info to indicate no sliding window for triton backward pass#3158
scxiao wants to merge 1 commit into
mainfrom
scxiao/add_error_info

Conversation

@scxiao
Copy link
Copy Markdown
Contributor

@scxiao scxiao commented May 12, 2026

Motivation

We tried to run flash attention sliding window case, triton implementation does not support that, but no error info is reported, which generate incorrect results. The sliding window input is ignored.

Technical Details

Added an error message to tell users for that.

Test Plan

There are existing tests

Test Result

passes

Submission Checklist

@scxiao scxiao requested review from a team and micmelesse May 12, 2026 21:51
@github-actions
Copy link
Copy Markdown
Contributor

🏷️ CI Guide

Runs automatically on every PR:

  • ✅ Pre-checks (submodule verification, code formatting)
  • ✅ Aiter op tests (gfx942 + gfx950)
  • ✅ Triton tests on MI35X (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label Tests
ci:triton-300x Run an additional Triton test job on MI300X in PRs; main branch always runs both MI35X and MI300X
ci:sglang SGLang integration tests: DeepSeek-R1-MXFP4 accuracy, Qwen 3.5 accuracy
ci:atom ATOM benchmark: DeepSeek-R1-0528, GPT-OSS-120B
ci:atom_full ATOM accuracy suite for PR and main models from ATOM models_accuracy.json
ci:vllm vLLM benchmark: GPT-OSS-120B, DeepSeek-R1-0528, Kimi-K2.5
ci:all All standard extended tests (excludes ci:atom_full)

Only add ci:atom_full for FlyDSL or Triton upgrades.
Add labels via the sidebar or gh pr edit 3158 --add-label <label>

Copy link
Copy Markdown
Contributor

@brunomazzottiamd brunomazzottiamd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

The failures in tests/test_flash_attn_triton_amd.py::test_flash_attn_varlen_output test, that's part of Flash Attention - Triton / MI35X (1 GPU) test job, were addressed in #2695. IMHO it isn't a blocker for merging.

@micmelesse
Copy link
Copy Markdown
Contributor

Let us wait for #2695 to get merged and then we can merge this

@micmelesse
Copy link
Copy Markdown
Contributor

@scxiao Can you rebase this pr?

@scxiao scxiao force-pushed the scxiao/add_error_info branch from 19c9a86 to a3face4 Compare May 20, 2026 15:23
@scxiao
Copy link
Copy Markdown
Contributor Author

scxiao commented May 20, 2026

@scxiao Can you rebase this pr?

Thanks. Done.

@brunomazzottiamd
Copy link
Copy Markdown
Contributor

@micmelesse, we have a Flash Attention Integration UT failure in this PR (on MI350):

=========================== short test summary info ============================
FAILED tests/test_flash_attn_triton_amd.py::test_flash_attn_varlen_output[0.0-0.17-1024-1024-192-True-False-True-False-mha-dtype0-False]
= 1 failed, 263 passed, 2640 deselected, 6 warnings, 2 rerun in 721.78s (0:12:01) =

It's an assertion error on dk comparison. I find this strange... Wasn't this fixed in #2695? Do you have any thoughts on this? Thanks!

@micmelesse
Copy link
Copy Markdown
Contributor

@brunomazzottiamd I saw that. I am retrying the job. I am also working on this pr that should address these flaky dropout tests #3289

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants