adapt w2 quant layer format for Minimax2.5#111
Open
shadowxz109 wants to merge 1 commit intoAscend:release/PoC_20260310from
Open
adapt w2 quant layer format for Minimax2.5#111shadowxz109 wants to merge 1 commit intoAscend:release/PoC_20260310from
shadowxz109 wants to merge 1 commit intoAscend:release/PoC_20260310from
Conversation
CLA Signature Guide@shadowxz109 , thanks for your pull request. The following commit(s) are not associated with a signed Contributor License Agreement (CLA).
To sign CLA, click here. To check if your email is configured correctly, refer to the FAQs. Once you've signed the CLA or updating your email, please comment |
Author
|
/check-cla |
CLA Signature Passshadowxz109, thanks for your pull request. All authors of the commits have signed the CLA. 👍 |
Comment on lines
+217
to
+218
| prefix_in_quant_config_down = prefix + ".0.down_proj.weight" | ||
| prefix_in_quant_config_w2 = prefix + ".0.w2.weight" |
There was a problem hiding this comment.
Suggested change
| prefix_in_quant_config_down = prefix + ".0.down_proj.weight" | |
| prefix_in_quant_config_w2 = prefix + ".0.w2.weight" | |
| prefix_in_quant_config_search_list = [".0.down_proj.weight", ".0.w2.weight"] | |
| quant_scheme_entry = None | |
| for prefix_in_quant_config in prefix_in_quant_config_search_list: | |
| if (quant_scheme_entry := self.quant_description.get(prefix_in_quant_config, None): | |
| break |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
adapt w2 quant weight format for Minimax2.5
Modifications
add '.0.w2.weight' format in get_moe_scheme
prefix change mlp to block_sparse_moe
Accuracy Tests
Accuracy: 0.955
Invalid: 0.000
Latency: 38.473 s
Output throughput: 520.775 token/s
Benchmarking and Profiling
Checklist
Review Process
/tag-run-ci-label,/rerun-failed-ci,/tag-and-rerun-ci