Open-Superintelligence-Lab.github.io/localhost-todos.md at main · Open-Superintelligence-Lab/Open-Superintelligence-Lab.github.io

Understand NVFP4 format vs MXFP4 differences

Learn Random Hadamard Transforms (RHT) for outlier bounding

Study two-dimensional quantization scheme

Understand stochastic rounding for unbiased gradients

Research selective high-precision layers approach

Analyze 12B model training on 10T tokens results

Compare MMLU-pro accuracy: 62.58% (NVFP4) vs 62.62% (FP8)

Set up experimental environment for FP4 training

Test NVFP4 implementation with smaller models first

Study TransformerEngine NVFP4 implementation: NVIDIA/TransformerEngine#2177

Explore NVFP4 recipe and PyTorch integration

Test NVFP4 support with fusible operations

Study Random Hadamard Transform (RHT) cast fusion

Understand NVFP4 quantization and dequantization kernels

Test distributed training with NVFP4

Provide feedback