Skip to content

feat(benchmark): per-model/GPU batch sizes and vocab projection for GEMM bench#265

Open
Z-Y00 wants to merge 1 commit into
AMD-AGI:mainfrom
Z-Y00:main
Open

feat(benchmark): per-model/GPU batch sizes and vocab projection for GEMM bench#265
Z-Y00 wants to merge 1 commit into
AMD-AGI:mainfrom
Z-Y00:main

Conversation

@Z-Y00
Copy link
Copy Markdown

@Z-Y00 Z-Y00 commented Mar 31, 2026

  • Replace hardcoded BATCH_SIZE_LIST with dict aligned with primus config
  • Add GPU_NAME_MAP to normalize GPU variants (MI355X -> MI35*, MI300X -> MI30*)
  • Populate batch sizes from Primus training configs for MI35*/MI30* across dense models (Llama-2, Llama-3.1, Qwen2.5) and MoE models (DeepSeek-V2/V3, Grok-2, Mixtral, Qwen3)
  • Add vocab_size to DenseModelConfigs and lm_head projection to gen_gemm_test_cases
  • Update bench_gemm_turbo/torch/te to use get_batch_sizes(model, dtype, gpu)

@xiaobochen-amd
Copy link
Copy Markdown
Collaborator

Please rebase onto the latest main, then resubmit the code to trigger CI/CD.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants