[FLYDSL] Add mixed slicek-splitk policies for flydsl hgemm by xytpai · Pull Request #3279 · ROCm/aiter

xytpai · 2026-05-20T03:48:20Z

GPT-OSS-120B Throughput Summary

Config	Ref Avg (tok/s)	Opt Avg (tok/s)	Speedup
TP1-MCC4	8,262.16	8,482.45	1.027x (+2.67%)
TP1-MCC8	13,980.48	14,446.98	1.033x (+3.34%)
TP1-MCC16	21,636.85	22,067.23	1.020x (+1.99%)
TP1-MCC32	31,782.37	31,884.19	1.003x (+0.32%)
TP2-MCC4	9,286.73	9,963.61	1.073x (+7.29%)
TP2-MCC8	16,117.64	16,415.76	1.018x (+1.85%)
TP2-MCC16	27,030.60	27,897.37	1.032x (+3.21%)
TP2-MCC32	35,261.69	36,825.58	1.044x (+4.44%)

GPTOSS TP2 Results

MODEL=/shared/data/amd_int/models/gpt-oss-120b
vllm bench serve \
    --backend vllm \
    --base-url http://127.0.0.1:8900 \
    --endpoint /v1/completions \
    --model ${MODEL} \
    --dataset-name random \
    --random-input-len 1000 \
    --random-output-len 100 \
    --max-concurrency 4 \
    --num-prompts 40 \
    --trust_remote_code \
    --num-warmups 8 \
    --request-rate inf \
    --ignore-eos \
    --disable-tqdm \
    --save-result \
    --percentile-metrics ttft,tpot,itl,e2el

before

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     3|exact_match|↑  |0.8984|±  |0.0083|
|     |       |strict-match    |     3|exact_match|↑  |0.5042|±  |0.0138|
============ Serving Benchmark Result ============
Successful requests:                     40        
Failed requests:                         0         
Maximum request concurrency:             4         
Benchmark duration (s):                  4.74      
Total input tokens:                      40000     
Total generated tokens:                  4000      
Request throughput (req/s):              8.44      
Output token throughput (tok/s):         844.30    
Peak output token throughput (tok/s):    860.00    
Peak concurrent requests:                12.00     
Total token throughput (tok/s):          9287.28   
---------------Time to First Token----------------
Mean TTFT (ms):                          94.98     
Median TTFT (ms):                        104.38    
P99 TTFT (ms):                           120.87    
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          3.82      
Median TPOT (ms):                        3.71      
P99 TPOT (ms):                           4.21      
---------------Inter-token Latency----------------
Mean ITL (ms):                           3.82      
Median ITL (ms):                         3.73      
P99 ITL (ms):                            4.43      
----------------End-to-end Latency----------------
Mean E2EL (ms):                          473.21    
Median E2EL (ms):                        472.28    
P99 E2EL (ms):                           485.53    
==================================================

after

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     3|exact_match|↑  |0.8893|±  |0.0086|
|     |       |strict-match    |     3|exact_match|↑  |0.5231|±  |0.0138|
============ Serving Benchmark Result ============
Successful requests:                     40        
Failed requests:                         0         
Maximum request concurrency:             4         
Benchmark duration (s):                  4.38      
Total input tokens:                      40000     
Total generated tokens:                  4000      
Request throughput (req/s):              9.14      
Output token throughput (tok/s):         913.61    
Peak output token throughput (tok/s):    948.00    
Peak concurrent requests:                16.00     
Total token throughput (tok/s):          10049.67  
---------------Time to First Token----------------
Mean TTFT (ms):                          79.05     
Median TTFT (ms):                        86.55     
P99 TTFT (ms):                           104.74    
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          3.62      
Median TPOT (ms):                        3.52      
P99 TPOT (ms):                           3.97      
---------------Inter-token Latency----------------
Mean ITL (ms):                           3.62      
Median ITL (ms):                         3.54      
P99 ITL (ms):                            4.15      
----------------End-to-end Latency----------------
Mean E2EL (ms):                          437.30    
Median E2EL (ms):                        436.06    
P99 E2EL (ms):                           456.54    
==================================================

github-actions · 2026-05-20T03:48:32Z

🏷️ CI Guide

Runs automatically on every PR:

✅ Pre-checks (submodule verification, code formatting)
✅ Aiter op tests (gfx942 + gfx950)
✅ Triton tests on MI35X (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label	Tests
`ci:triton-300x`	Run an additional Triton test job on MI300X in PRs; main branch always runs both MI35X and MI300X
`ci:sglang`	SGLang integration tests: DeepSeek-R1-MXFP4 accuracy, Qwen 3.5 accuracy
`ci:atom`	ATOM benchmark: DeepSeek-R1-0528, GPT-OSS-120B
`ci:atom_full`	ATOM accuracy suite for PR and main models from ATOM `models_accuracy.json`
`ci:vllm`	vLLM benchmark: GPT-OSS-120B, DeepSeek-R1-0528, Kimi-K2.5
`ci:all`	All standard extended tests (excludes `ci:atom_full`)

Only add ci:atom_full for FlyDSL or Triton upgrades.
Add labels via the sidebar or gh pr edit 3279 --add-label <label>

xytpai added 2 commits May 20, 2026 03:42

add mixed slicek

8bf075d

update code

01883e0

xytpai requested a review from a team May 20, 2026 03:48

xytpai added 4 commits May 20, 2026 06:00

update spk limit

cf2b63c

refine tune

8e4a983

add tuned config for gptoss

a52feed

add more cfgs

e5d867a

XiaobingSuper requested a review from coderfeli May 20, 2026 12:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLYDSL] Add mixed slicek-splitk policies for flydsl hgemm#3279

[FLYDSL] Add mixed slicek-splitk policies for flydsl hgemm#3279
xytpai wants to merge 6 commits into
mainfrom
xyt/hgemm_spk2

xytpai commented May 20, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

xytpai commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

GPT-OSS-120B Throughput Summary

GPTOSS TP2 Results

Uh oh!

github-actions Bot commented May 20, 2026

🏷️ CI Guide

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

xytpai commented May 20, 2026 •

edited

Loading