Skip to content

Support block-wise fp8 quant#1487

Open
mengniwang95 wants to merge 24 commits intomainfrom
mengni/block_wise
Open

Support block-wise fp8 quant#1487
mengniwang95 wants to merge 24 commits intomainfrom
mengni/block_wise

Conversation

@mengniwang95
Copy link
Contributor

@mengniwang95 mengniwang95 commented Mar 3, 2026

Description

Support block-wise fp8 quant

#959

Type of Change

  • Bug fix
  • New feature
  • Documentation update
  • Performance improvement
  • Code refactoring
  • Other (please specify):

mengniwang95 and others added 6 commits March 3, 2026 11:14
Signed-off-by: Mengni Wang <mengni.wang@intel.com>
Signed-off-by: Mengni Wang <mengni.wang@intel.com>
Signed-off-by: Mengni Wang <mengni.wang@intel.com>
Signed-off-by: Mengni Wang <mengni.wang@intel.com>
Copy link
Contributor

@yiliu30 yiliu30 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, left a few comments.

@mengniwang95
Copy link
Contributor Author

mengniwang95 commented Mar 4, 2026

RTN:

Tasks Version Filter n-shot Metric Value Stderr
gsm8k_llama 3 flexible_extract 8 exact_match 0.7968 ± 0.0111
strict_match 8 exact_match 0.7392 ± 0.0121
hellaswag 1 none 0 acc 0.5770 ± 0.0049
none 0 acc_norm 0.7538 ± 0.0043
piqa 1 none 0 acc 0.7829 ± 0.0096
none 0 acc_norm 0.7927 ± 0.0095
mmlu_llama 1 strict_match exact_match 0.6672 ± 0.0038
- humanities 1 strict_match exact_match 0.6138 ± 0.0067
- other 1 strict_match exact_match 0.7425 ± 0.0075
- social sciences 1 strict_match exact_match 0.7667 ± 0.0075
- stem 0 strict_match exact_match 0.5756 ± 0.0085
mmlu_pro_llama 1 strict_match exact_match 0.3752 ± 0.0043

Tuning:

Tasks Version Filter n-shot Metric Value Stderr
gsm8k_llama 3 flexible_extract 8 exact_match 0.7870 ± 0.0113
strict_match 8 exact_match 0.7589 ± 0.0118
hellaswag 1 none 0 acc 0.5830 ± 0.0049
none 0 acc_norm 0.7604 ± 0.0043
piqa 1 none 0 acc 0.7797 ± 0.0097
none 0 acc_norm 0.7813 ± 0.0096
mmlu_llama 1 strict_match exact_match 0.6737 ± 0.0037
- humanities 1 strict_match exact_match 0.6232 ± 0.0067
- other 1 strict_match exact_match 0.7496 ± 0.0074
- social sciences 1 strict_match exact_match 0.7712 ± 0.0074
mmlu_pro_llama 1 strict_match exact_match 0.3821 ± 0.0043

@wenhuach21
Copy link
Contributor

RTN:

Tasks Version Filter n-shot Metric Value Stderr
gsm8k_llama 3 flexible_extract 8 exact_match ↑ 0.7809 ± 0.0114
strict_match 8 exact_match ↑ 0.7665 ± 0.0117
hellaswag 1 none 0 acc ↑ 0.5820 ± 0.0049
none 0 acc_norm ↑ 0.7663 ± 0.0042
piqa 1 none 0 acc ↑ 0.7650 ± 0.0099
none 0 acc_norm ↑ 0.7661 ± 0.0099
Tuning:

Tasks Version Filter n-shot Metric Value Stderr
gsm8k_llama 3 flexible_extract 8 exact_match ↑ 0.7779 ± 0.0114
strict_match 8 exact_match ↑ 0.7430 ± 0.0120
hellaswag 1 none 0 acc ↑ 0.5821 ± 0.0049
none 0 acc_norm ↑ 0.7604 ± 0.0043
piqa 1 none 0 acc ↑ 0.7753 ± 0.0097
none 0 acc_norm ↑ 0.7742 ± 0.0098

Thanks for the data, add mmlu and mmlu pro please

@mengniwang95
Copy link
Contributor Author

ut depends on #1525

@mengniwang95
Copy link
Contributor Author

I will create another PR to optimize the quant function for rtn/opt_rtn/tuning @wenhuach21

mengniwang95 and others added 5 commits March 13, 2026 06:30
Signed-off-by: Mengni Wang <mengni.wang@intel.com>
Signed-off-by: Mengni Wang <mengni.wang@intel.com>
mengniwang95 and others added 6 commits March 16, 2026 01:24
Signed-off-by: Mengni Wang <mengni.wang@intel.com>
Signed-off-by: Mengni Wang <mengni.wang@intel.com>
Signed-off-by: Mengni Wang <mengni.wang@intel.com>
@chensuyue chensuyue added this to the 0.12.0 milestone Mar 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants