Skip to content

feat: add Qwen3.5 Dense Model EAGLE3 training support#516

Open
36330 wants to merge 2 commits intosgl-project:mainfrom
36330:feat/qwen3-5-eagle3-support
Open

feat: add Qwen3.5 Dense Model EAGLE3 training support#516
36330 wants to merge 2 commits intosgl-project:mainfrom
36330:feat/qwen3-5-eagle3-support

Conversation

@36330
Copy link
Copy Markdown

@36330 36330 commented Mar 28, 2026

  • Add qwen3_5_eagle_patch.py: Monkey patch for SGLang's Qwen3.5 models

    • Captures aux_hidden_states from 3 layers (layer 2, mid, layer-3)
    • Supports both Qwen3_5ForCausalLM and Qwen3_5ForConditionalGeneration
    • Environment variable QWEN35_EAGLE3_ENABLE=1 to enable
  • Add qwen3.5-4b-eagle3.json: Draft model config for Qwen3.5-4B

  • Modify train_eagle3.py: Auto-apply patch on initialization

  • Modify eagle3_target_model.py: Auto-detect and patch Qwen3.5 models

  • Modify llama3_eagle.py: Handle 'default' RoPE scaling type for Qwen3.5

Tested:

  • Hidden states generation: ✓
  • Training with TTT: ✓
  • Qwen3.5-4B Dense model: ✓

example

# generate hidden states
torchrun --nproc_per_node=1 scripts/prepare_hidden_states.py \
    --target-model-path Qwen/Qwen3.5-4B \
    --enable-aux-hidden-states

# train
torchrun --nproc_per_node=4 scripts/train_eagle3.py \
    --target-model-path Qwen/Qwen3.5-4B \
    --draft-model-config configs/qwen3.5-4b-eagle3.json

- Add qwen3_5_eagle_patch.py: Monkey patch for SGLang's Qwen3.5 models
  - Captures aux_hidden_states from 3 layers (layer 2, mid, layer-3)
  - Supports both Qwen3_5ForCausalLM and Qwen3_5ForConditionalGeneration
  - Environment variable QWEN35_EAGLE3_ENABLE=1 to enable

- Add qwen3.5-4b-eagle3.json: Draft model config for Qwen3.5-4B

- Modify train_eagle3.py: Auto-apply patch on initialization

- Modify eagle3_target_model.py: Auto-detect and patch Qwen3.5 models

- Modify llama3_eagle.py: Handle 'default' RoPE scaling type for Qwen3.5

Tested:
- Hidden states generation: ✓
- Training with TTT: ✓
- Qwen3.5-4B Dense model: ✓
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@36330
Copy link
Copy Markdown
Author

36330 commented Mar 28, 2026

cc7160a2-ede5-4145-8170-a3c872d2faa6

@jiapingW
Copy link
Copy Markdown
Collaborator

We have implemented the qwen3.5 eagle3 training in PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants