Skip to content

Enable per-token scaled FP4 grouped gemm on B200#356

Open
jwfromm wants to merge 1 commit into
meta-pytorch:mainfrom
jwfromm:export-D106103018
Open

Enable per-token scaled FP4 grouped gemm on B200#356
jwfromm wants to merge 1 commit into
meta-pytorch:mainfrom
jwfromm:export-D106103018

Conversation

@jwfromm
Copy link
Copy Markdown
Contributor

@jwfromm jwfromm commented May 22, 2026

Summary: Adds templating to allow per-token FP4 grouped gemm kernel to run on GB200 as well as GB300. This is done in a purely static way so it has no impact on performance. The new GB200 functionality has similar perf to the standard global scale grouped gemm.

Differential Revision: D106103018

Summary: Adds templating to allow per-token FP4 grouped gemm kernel to run on GB200 as well as GB300. This is done in a purely static way so it has no impact on performance. The new GB200 functionality has similar perf to the standard global scale grouped gemm.

Differential Revision: D106103018
@meta-cla meta-cla Bot added the cla signed label May 22, 2026
@meta-codesync
Copy link
Copy Markdown

meta-codesync Bot commented May 22, 2026

@jwfromm has exported this pull request. If you are a Meta employee, you can view the originating Diff in D106103018.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant