-
Notifications
You must be signed in to change notification settings - Fork 46
Pull requests: meta-pytorch/MSLK
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Enable per-token scaled FP4 grouped gemm on B200
cla signed
fb-exported
meta-exported
#356
opened May 22, 2026 by
jwfromm
Contributor
Loading…
Deprecate usage of hypothesis
cla signed
fb-exported
meta-exported
#353
opened May 21, 2026 by
cthi
Contributor
Loading…
Apply fixup patch to fbsource
cla signed
fb-exported
meta-exported
#349
opened May 19, 2026 by
jwfromm
Contributor
Loading…
Add pyre-strict to mslk/mslk/conv/_meta.py
cla signed
fb-exported
meta-exported
#346
opened May 18, 2026 by
jwfromm
Contributor
Loading…
Switch to upstream Cutlass dependency
cla signed
fb-exported
meta-exported
#342
opened May 12, 2026 by
jwfromm
Contributor
Loading…
Remove FBGEMM as a conda dependency
cla signed
fb-exported
meta-exported
#334
opened Apr 26, 2026 by
jwfromm
Contributor
Loading…
Update GB200 RL Conda package with MSLK Fixes.
cla signed
fb-exported
meta-exported
#333
opened Apr 25, 2026 by
jwfromm
Contributor
Loading…
[CUDA] [PERFORMANCE] Increase speed of bf16bf16bf16_grouped_wgrad via indicating that ElementC is void / nullptr
cla signed
#329
opened Apr 19, 2026 by
benediktjohannes
Loading…
Add native MX8×MX4 mixed-precision GEMM kernel (f8f4bf16)
cla signed
fb-exported
meta-exported
#313
opened Apr 6, 2026 by
isratnisa
Loading…
CUDA graph support — 5x speedup at small N
cla signed
fb-exported
meta-exported
#309
opened Apr 2, 2026 by
jduprat
Contributor
Loading…
Block-sparse compressed attention (sub-quadratic compressed branch) (#308)
cla signed
fb-exported
meta-exported
#308
opened Apr 2, 2026 by
jduprat
Contributor
Loading…
NSA backward — benchmarks and performance documentation (#307)
cla signed
fb-exported
meta-exported
#307
opened Apr 2, 2026 by
jduprat
Contributor
Loading…
NSA backward — autograd function (fixed-length + varlen) (#306)
cla signed
fb-exported
meta-exported
#306
opened Apr 2, 2026 by
jduprat
Contributor
Loading…
NSA backward — FA4 backward wrapper, block-sparse index transpose (#305)
cla signed
fb-exported
meta-exported
#305
opened Apr 2, 2026 by
jduprat
Contributor
Loading…
NSA backward — compression and gating backward kernels (#304)
cla signed
fb-exported
meta-exported
#304
opened Apr 2, 2026 by
jduprat
Contributor
Loading…
Fix int32 overflow in CuteDSL kernels for N >= 2M
cla signed
fb-exported
meta-exported
#303
opened Apr 2, 2026 by
jduprat
Contributor
Loading…
Fused CuteDSL kernel for KV compression (#302)
cla signed
fb-exported
meta-exported
#302
opened Apr 2, 2026 by
jduprat
Contributor
Loading…
Fused CuteDSL kernel for block selection scoring (#301)
cla signed
fb-exported
meta-exported
#301
opened Apr 2, 2026 by
jduprat
Contributor
Loading…
Fused CuteDSL gating kernel (#300)
cla signed
fb-exported
meta-exported
#300
opened Apr 2, 2026 by
jduprat
Contributor
Loading…
NSA forward — foundation, reference implementations, compact metadata
cla signed
fb-exported
meta-exported
#299
opened Apr 2, 2026 by
jduprat
Contributor
Loading…
Update benchmarks to 1M tokens, add memory diagnostics
cla signed
fb-exported
meta-exported
#297
opened Mar 30, 2026 by
jduprat
Contributor
Loading…
Update FINDINGS.md with optimization round results
cla signed
fb-exported
meta-exported
#296
opened Mar 30, 2026 by
jduprat
Contributor
Loading…
Replace CuteDSL compress + gating kernels with pure PyTorch
cla signed
fb-exported
meta-exported
#295
opened Mar 30, 2026 by
jduprat
Contributor
Loading…
Complete compress_factor: backward path + remove mask_mod from NSA
cla signed
fb-exported
meta-exported
#294
opened Mar 30, 2026 by
jduprat
Contributor
Loading…
Previous Next
ProTip!
Type g i on any issue or pull request to go back to the issue listing page.