[Bug] : rotation parameter not saved and not used in inference ?
in other words how can I use rotation for correct inference?
Describe the bug
The rotation parameter (for SpinQuant/QuaRot preprocessing) is used during quantization but:
- It is NOT saved to
quantize_config.json when saving the model
- It is NOT used in quantized layer's
forward method during inference
- This causes incorrect inference results (extremely high perplexity, e.g., PPL: 48564640.0)
GPU Info
NVIDIA GeForce RTX 5090
CUDA Version: 13.0
Driver Version: 580.76.05
Software Info
- OS: Linux 5.15.0-78-generic
- Python: 3.12.3
- gptqmodel: 5.6.12
- torch: 2.9.0
- transformers: 4.57.3
- accelerate: 1.12.0
- triton: 3.5.0
quantize_config.json
{
"bits": 2,
"group_size": 128,
"desc_act": false,
"sym": true,
"quant_method": "gptq",
"checkpoint_format": "gptq",
"meta": {
"gptaq": true,
"gptaq_alpha": 0.25,
"act_group_aware": true
}
}
Note: The rotation field is missing, even though rotation='hadamard' was used during quantization.
To Reproduce
- Quantize a model with rotation:
quant_config = QuantizeConfig(..., rotation='hadamard')
model = GPTQModel.from_pretrained(model_path, quantize_config=quant_config)
model.quantize(calibration_data)
model.save(quant_path)
- Load for inference:
model = GPTQModel.from_quantized(quant_path)
# Even manually setting rotation doesn't help:
# model.quantize_config.rotation = 'hadamard'
# Because quantized layer's forward() doesn't check this parameter
- Result: Incorrect inference (PPL: 48564640.0 instead of normal values)
Expected behavior
rotation parameter should be saved to quantize_config.json
- Quantized layer's
forward method should check and use rotation parameter
- If rotation is set, apply inverse rotation during inference to restore correct outputs is reasonable
Additional context
- Quantization works correctly (rotation is applied via
rotate_model() in base.py:586-610)
- Issue is in inference: quantized layers (e.g.,
TritonV2QuantLinear.forward) don't apply inverse rotation
- Manual setting of
rotation after loading doesn't help because forward() doesn't check it
- This appears to be incomplete implementation of rotation feature?
[Bug] : rotation parameter not saved and not used in inference ?
in other words how can I use rotation for correct inference?
Describe the bug
The
rotationparameter (for SpinQuant/QuaRot preprocessing) is used during quantization but:quantize_config.jsonwhen saving the modelforwardmethod during inferenceGPU Info
Software Info
quantize_config.json
{ "bits": 2, "group_size": 128, "desc_act": false, "sym": true, "quant_method": "gptq", "checkpoint_format": "gptq", "meta": { "gptaq": true, "gptaq_alpha": 0.25, "act_group_aware": true } }Note: The
rotationfield is missing, even thoughrotation='hadamard'was used during quantization.To Reproduce
Expected behavior
rotationparameter should be saved toquantize_config.jsonforwardmethod should check and userotationparameterAdditional context
rotate_model()inbase.py:586-610)TritonV2QuantLinear.forward) don't apply inverse rotationrotationafter loading doesn't help becauseforward()doesn't check it