Add DeepseekV3HybridMoeModuleArchitecture#661
Add DeepseekV3HybridMoeModuleArchitecture#661zhoutong-hai wants to merge 2 commits intoarcee-ai:mainfrom
Conversation
|
All contributors have signed the CLA ✍️ ✅ |
|
I have read the CLA Document and I hereby sign the CLA |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| return [ | ||
| WeightInfo(name="model.norm.weight"), | ||
| WeightInfo(name="lm_head.weight", is_embed=True), | ||
| ] |
There was a problem hiding this comment.
Missing optional and tied_names for lm_head.weight
High Severity
The lm_head.weight in post_weights is missing optional=True and tied_names=("model.embed_tokens.weight",). Many transformer models (including DeepSeek V3 variants) tie the input embeddings with the output LM head, storing only one copy as model.embed_tokens.weight. Without optional and tied_names, the weight loading will fail when lm_head.weight doesn't exist separately in the checkpoint, as the code has no fallback to look for the weight under its tied name. Other architecture definitions (LLaMA, Mistral) correctly handle this pattern.
Note
Introduces support for DeepSeek V3’s hybrid dense+MoE layout and wires it into architecture selection.
DeepseekV3HybridMoeModuleArchitecturewith config-driven layer mapping: densemlp.{gate_proj,up_proj,down_proj}for initial layers, then MoE (mlp.gate[.e_score_correction_bias?],mlp.shared_experts.*,mlp.experts.{i}.*) bymoe_layer_freq; includes attention, pre/post weightsarchitecture/__init__.py(model_type=deepseek_v3)arbitrary_types_allowed=TrueonTaskto accept non-pydantic typesWritten by Cursor Bugbot for commit efce27e. This will update automatically on new commits. Configure here.