support hadamard transform for mxfp4 with rtn or autoround method#1515
support hadamard transform for mxfp4 with rtn or autoround method#1515chensuyue merged 43 commits intointel:mainfrom
Conversation
Signed-off-by: lkk12014402 <kaokao.lv@intel.com>
for more information, see https://pre-commit.ci
|
This PR is a refactored version of the original PR: #1349 |
There was a problem hiding this comment.
Pull request overview
Adds Hadamard-based rotation support for MXFP4/NVFP4 workflows by introducing an experimental transform pipeline (weight rotation + activation-side transform during inference) and a CUDA test exercising quantize→save→HF load→generate.
Changes:
- Introduce experimental transform modules/config and a Triton MXFP4 Hadamard+QDQ kernel wrapper.
- Plumb
transform_configthrough quantization scheme/config and apply activation transforms during HF model conversion. - Add a CUDA integration test that applies transform + quantizes/saves then runs HF generation.
Reviewed changes
Copilot reviewed 9 out of 10 changed files in this pull request and generated 15 comments.
Show a summary per file
| File | Description |
|---|---|
auto_round/experimental/transform/apply.py |
Applies transforms to model modules; registers activation pre-hooks or fuses weight transforms. |
auto_round/experimental/transform/transforms.py |
Defines Hadamard/identity transforms and transform factory. |
auto_round/experimental/transform/transform_config.py |
Pydantic config object for transform serialization. |
auto_round/experimental/transform/triton/mxfp4.py |
Triton kernel for Hadamard + FP4 QDQ on activations. |
auto_round/inference/convert_model.py |
Threads transform_config into layer config and registers activation transform during conversion. |
auto_round/schemes.py |
Adds transform_config field to QuantizationScheme. |
auto_round/compressors/base.py |
Adds transform_config to serialization keys and compressor init args. |
auto_round/autoround.py |
Adds transform_config arg pass-through on AutoRound construction. |
test/test_cuda/transform/test_mxfp4_transform.py |
New CUDA test for transform + MXFP4 quantize/save and HF inference. |
You can also share your feedback on Copilot code review. Take the survey.
|
I left several comments in the previous comments and several of them have not been addressed |
|
@n1ck-guo please have a careful review, otherwise, it will be your work to refine the API |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: lkk12014402 <kaokao.lv@intel.com>
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
Signed-off-by: lkk12014402 <kaokao.lv@intel.com>
for more information, see https://pre-commit.ci
Signed-off-by: lkk12014402 <kaokao.lv@intel.com>
Signed-off-by: lkk12014402 <kaokao.lv@intel.com>
|
Additionally, it would be better to run some accuracy tests for Qwen and LLaMA to verify correctness and document the results in the docs folder. |
|
Besides, the same Hadamard transformation for fused modules such as QKV and MoE is not handled yet. If it is not supported in this version, we should warn users. |
Code reviewNo issues found. Checked for bugs and CLAUDE.md compliance. 🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
Thanks~ |
I see. As a next step, we have a plan to enable fused modules such as QKV, MLP, and MoE. In the current version, the Hadamard transform for these fused patterns is not handled yet, so we’ll add clear guidance in the documentation and warn users when it’s not supported to avoid confusion. |
|
@wenhuach21 @yiliu30 please help review |
Signed-off-by: lkk12014402 <kaokao.lv@intel.com>
|
@wenhuach21 The test accuracy results are here, evaluated with lm_eval using --model hf.
|
Better document it and add qwen3-8B |
|
Besides the llama3.1 8B, can we also post a 70B model as well? Prepare the readme to show the recipe and acc data in a separate PR. |
will do |
|
please resolve the API issue and then merge |
Description
original linear:
transform matrix$$H$$ (Hadamard should $$H^\top H = I$$ ,and $$H^{-1}=H^\top$$ ):
define:
then:
with huggingface/transformers
with vllm