Scaling Beyond Masked Diffusion Language Models
-
Updated
Feb 18, 2026 - Python
Scaling Beyond Masked Diffusion Language Models
This work introduces Flow Matching Mixture of Experts (FM-MoE), a framework that replaces conventional MLP experts with flow matching networks. Each expert learns a continuous transformation through an ordinary differential equation (ODE), enabling more expressive feature mappings while maintaining the computational efficiency of sparse experts
Add a description, image, and links to the diffusion-llms topic page so that developers can more easily learn about it.
To associate your repository with the diffusion-llms topic, visit your repo's landing page and select "manage topics."