SemiAnalysisAI / InferenceX Public

Notifications You must be signed in to change notification settings
Fork 112
Star 733

Code
Issues 100
Pull requests 15
Discussions
Actions
Projects
Models
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Models
Security
Insights

Pull requests: SemiAnalysisAI/InferenceX

Labels 29 Milestones 6

New pull request New

15 Open 623 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

Disable prefix caching for qwen3.5 & glm5 AMD benchmarks sweep-enabled

#970 opened Mar 28, 2026 by functionstackx

Loading…

[AMD/ROCM] ATOM support for new models: Kimi-K2.5 FP4, GLM-5 FP8, and MiniMax-M2.5

#963 opened Mar 27, 2026 by seungrokj

Loading…

[DRAFT] [AMD] Update Minimax M2.5 MI325 image and adjust search space AMD

#953 opened Mar 27, 2026 by benenzhu

Loading…

[DNM - vllm 0.18 is bugged, need to wait till 0.19] Add MiniMax M2.5 NVFP4 benchmark for B200 vLLM (TP4, TP2) sweep-enabled

#952 opened Mar 27, 2026 by functionstackx • Draft

[Draft, no merge] MVP for vLLM Disagg

#948 opened Mar 26, 2026 by chunfangamd

Loading…

[WIP] B200 Minimax FP8 vllm upgrade NVIDIA sweep-enabled

#947 opened Mar 26, 2026 by kedarpotdar-nv

Loading…

[Don't Merge] Update cli args qwen

#946 opened Mar 25, 2026 by zhentaocc • Draft

Add Qwen3.5 h200 MTP NVIDIA sweep-enabled

#921 opened Mar 20, 2026 by hshrivastava-droid

Loading…

fix: multi-turn benchmark hangs after all clients finish

#908 opened Mar 13, 2026 by lishicheng1996-nv

Loading…

3 of 4 tasks

Add Kimi-K2.5 INT4 vLLM v0.16.0 benchmark for MI300X AMD sweep-enabled

#860 opened Mar 3, 2026 by functionstackx

Loading…

chore: upgrade h200 gptoss to latest trtllm

#854 opened Mar 2, 2026 by cquil11

Loading…

Add MiniMax M2.5 MXFP4 benchmark for MI355x vLLM v0.17.1 (TP=2,4) AMD sweep-enabled

#827 opened Mar 1, 2026 by functionstackx

Loading…

[WIP][experimental] multi turn chat benchmark

#821 opened Feb 27, 2026 by cquil11 • Draft

[NV] Qwen3.5 B200 SGLang FP4 configs NVIDIA sweep-enabled

#820 opened Feb 27, 2026 by kedarpotdar-nv

Loading…

Performance Improvements for MI300X with GEMM and FP8 Enhancements

#811 opened Feb 26, 2026 by chunfangamd

Loading…

ProTip! Type g i on any issue or pull request to go back to the issue listing page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!