feat(benchmark): per-model/GPU batch sizes and vocab projection for GEMM bench by Z-Y00 · Pull Request #265 · AMD-AGI/Primus-Turbo

Z-Y00 · 2026-03-31T23:28:11Z

Replace hardcoded BATCH_SIZE_LIST with dict aligned with primus config
Add GPU_NAME_MAP to normalize GPU variants (MI355X -> MI35*, MI300X -> MI30*)
Populate batch sizes from Primus training configs for MI35*/MI30* across dense models (Llama-2, Llama-3.1, Qwen2.5) and MoE models (DeepSeek-V2/V3, Grok-2, Mixtral, Qwen3)
Add vocab_size to DenseModelConfigs and lm_head projection to gen_gemm_test_cases
Update bench_gemm_turbo/torch/te to use get_batch_sizes(model, dtype, gpu)

xiaobochen-amd · 2026-04-12T14:06:06Z

Please rebase onto the latest main, then resubmit the code to trigger CI/CD.

…chtitan configs

Z-Y00 requested review from wenxie-amd and xiaobochen-amd as code owners March 31, 2026 23:28

Z-Y00 force-pushed the main branch from da3bf1f to db92df1 Compare April 17, 2026 22:09

feat(benchmark): per-model/GPU batch sizes, vocab projection, and tor…

c15720a

…chtitan configs

Z-Y00 force-pushed the main branch from db92df1 to c15720a Compare May 14, 2026 00:17

Z-Y00 requested a review from RuibinCheung as a code owner May 14, 2026 00:17

Provide feedback