Skip to content

[WIP] feat: qwen3.5 support#7655

Draft
furionw wants to merge 2 commits intomainfrom
qiwa/qwen3.5
Draft

[WIP] feat: qwen3.5 support#7655
furionw wants to merge 2 commits intomainfrom
qiwa/qwen3.5

Conversation

@furionw
Copy link
Copy Markdown
Contributor

@furionw furionw commented Mar 26, 2026

WIP

Summary by CodeRabbit

  • Documentation
    • Added comprehensive test results documentation for Qwen3.5 models across various configurations and GPU topology setups.
    • Documents identified limitations and known blockers for specific deployment scenarios.

@github-actions github-actions Bot added the documentation Improvements or additions to documentation label Mar 26, 2026
@github-actions
Copy link
Copy Markdown
Contributor

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 26, 2026

Walkthrough

Adds a new markdown document recording dynamo vLLM test results for Qwen3.5 models across multiple sizes and GPU topology configurations, including pass/fail outcomes, identified blockers, and support limitations.

Changes

Cohort / File(s) Summary
Documentation
test-qwen35-results.md
New markdown file documenting dynamo vLLM test results for Qwen3.5 models (2B, 27B, 35B-A3B-FP8) across various GPU topologies, including identified blockers with transformers KeyError and vLLM KV cache incompatibility, and a consolidated summary restricting support to AGG and MM Routing topologies.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The pull request description is critically incomplete. It contains only 'WIP' with no details about changes, overview, or related issues. The required template sections (Overview, Details, Where to start, Related Issues) are entirely missing. Complete the description by filling in all required template sections: overview of what's being added, details about the test results document, which files to review, and any related GitHub issues.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title '[WIP] feat: qwen3.5 support' is partially related to the changeset. It indicates qwen3.5 support work, which aligns with the test results document added, but uses '[WIP]' and is overly broad without specifying that this documents test results rather than implementing the feature.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
test-qwen35-results.md (1)

18-18: Local log paths are not accessible to other users.

The referenced log path dynamo/logs/test-qwen35/ appears to be local to your environment. Consider either:

  • Including relevant logs in the repository
  • Uploading logs to a shared location (e.g., CI artifacts, cloud storage)
  • Adding a note that logs are available upon request
  • Or acknowledging that logs are local-only for reference

This also applies to lines 31 and 45.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test-qwen35-results.md` at line 18, The README/reference to the local log
path `dynamo/logs/test-qwen35/` is not accessible to others; update the file to
either (a) include the relevant log excerpts inline, (b) upload the full logs to
a shared location (CI artifacts or cloud storage) and replace the local path
with the public link, or (c) add a clear note next to `dynamo/logs/test-qwen35/`
(and the similar entries at the other occurrences) stating that the logs are
local-only and available on request, so reviewers know how to access them.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test-qwen35-results.md`:
- Around line 1-84: This test-results markdown belongs in docs or testing
artifacts rather than the repo root; either convert it into a permanent doc at
docs/qwen35-support.md (generalize/remove dates, container/image tags, local log
paths and machine-specific GPU details, keep topology matrix and root cause
bullets referencing components/src/dynamo/vllm/multimodal_utils/model.py and the
vLLM kv_cache_utils.py hybrid-KV note) or move it to
docs/testing/qwen35-2026-03-25.md as an archived test record (keep full
environment details there) and delete the root-level file; also update CI file
filters to allow the chosen docs path so the pipeline no longer flags it as
unexpected.
- Line 1: The CI failed because test-qwen35-results.md sits outside the repo
locations covered by the CI path/coverage filters; move test-qwen35-results.md
into an expected location (e.g., docs/ or tests/) and/or update the CI's path
filter configuration to include this filename or its new directory (adjust the
CI workflow's path include patterns or coverage filter settings) so the file is
picked up by the pipeline.

---

Nitpick comments:
In `@test-qwen35-results.md`:
- Line 18: The README/reference to the local log path `dynamo/logs/test-qwen35/`
is not accessible to others; update the file to either (a) include the relevant
log excerpts inline, (b) upload the full logs to a shared location (CI artifacts
or cloud storage) and replace the local path with the public link, or (c) add a
clear note next to `dynamo/logs/test-qwen35/` (and the similar entries at the
other occurrences) stating that the logs are local-only and available on
request, so reviewers know how to access them.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e4cb2474-45fe-4d31-81bd-ac33c16be6a3

📥 Commits

Reviewing files that changed from the base of the PR and between a58bcc3 and acd64e4.

📒 Files selected for processing (1)
  • test-qwen35-results.md

Comment thread test-qwen35-results.md
Comment on lines +1 to +84
# dynamo.vllm Qwen3.5 Test Results

**Date:** 2026-03-25/26
**Image:** `dynamo:latest-vllm-local-dev-03-25`
**Host GPUs:** GPU 0 (A400 4GB, unused), GPU 1 (RTX 6000 Ada 49GB), GPU 2 (RTX PRO 6000 Blackwell 98GB)

## Key Findings

1. **Qwen3.5 is multimodal** — all variants (2B, 27B, 35B-A3B-FP8) handle vision inputs natively
2. **AGG and MM Routing work across all sizes** — confirmed for 2B, 27B, and 35B-A3B-FP8
3. **P/D Disagg blocked by vLLM hybrid KV cache** — Qwen3.5's hybrid architecture (attention + Mamba/GDN layers) is incompatible with `--kv-transfer-config` which disables the hybrid KV cache manager
4. **E_PD/E_P_D blocked by outdated `transformers`** — standalone encode worker's `AutoModel.from_pretrained()` doesn't recognize `qwen3_5`

---

## Qwen/Qwen3.5-2B

Logs: `dynamo/logs/test-qwen35/`

| Topology | Status | Notes |
|----------|--------|-------|
| **AGG** | **PASS** | Text + multimodal both work |
| **MM Routing** | **PASS** | Text + multimodal through KV-aware router |
| **E_PD** | **FAIL** | `transformers` KeyError: `qwen3_5` |
| **E_P_D** | **FAIL** | Same |

---

## Qwen/Qwen3.5-27B

Logs: `dynamo/logs/test-qwen35-27b/`

| Topology | Status | Notes |
|----------|--------|-------|
| **AGG** | **PASS** | Text + multimodal. 51.1 GiB on 98GB Blackwell |
| **MM Routing** | **PASS** | Text + multimodal. First request ~92s cold start |
| **P/D Disagg** | **FAIL (OOM)** | 27B bf16 (54GB) doesn't fit on 49GB Ada |
| **E_PD** | **FAIL** | `transformers` KeyError: `qwen3_5` |
| **E_P_D** | **FAIL** | Same (confirmed from 2B, not re-run) |

---

## Qwen/Qwen3.5-35B-A3B-FP8 (MoE)

Logs: `dynamo/logs/test-qwen35-35b/`

| Topology | Status | Notes |
|----------|--------|-------|
| **AGG** | **PASS** | Text + multimodal. 34.23 GiB FP8 on 98GB Blackwell |
| **MM Routing** | **PASS** | Text + multimodal. Model served as `__internal` name but requests succeeded |
| **P/D Disagg** | **FAIL** | vLLM error: "Hybrid KV cache manager is disabled but failed to convert KV cache specs to one unified type." Qwen3.5 MoE's hybrid arch (attention + Mamba) incompatible with `--kv-transfer-config` |
| **E_PD** | **FAIL** | `transformers` KeyError: `qwen3_5` (expected) |
| **E_P_D** | **FAIL** | Same (expected) |

---

## Root Causes

### 1. E_PD / E_P_D: `transformers` doesn't support `qwen3_5`
- **Where:** `components/src/dynamo/vllm/multimodal_utils/model.py` → `AutoModel.from_pretrained()`
- **Fix:** Upgrade transformers, or use vLLM's native encoder path with Qwen3.5 added to `SupportedModels`

### 2. P/D Disagg: Hybrid KV cache incompatibility
- **Where:** vLLM `kv_cache_utils.py:1172` — hybrid KV cache manager disabled by `--kv-transfer-config`
- **Root cause:** Qwen3.5 has hybrid attention layers (standard attention + GatedDeltaNet/Mamba), requiring the hybrid KV cache manager. But `--kv-transfer-config` (required for P/D disagg with NixlConnector) forces it off.
- **Fix:** vLLM needs to support hybrid KV cache + KV transfer together, or the NixlConnector needs to handle heterogeneous KV cache specs

### 3. P/D Disagg OOM (27B only): Hardware limitation
- **Where:** 27B bf16 needs ~54GB, GPU 1 only has 49GB
- **Fix:** Use FP8 quantized variant or larger GPUs

---

## Summary Matrix

| Topology | 2B | 27B | 35B-A3B-FP8 |
|----------|-----|------|-------------|
| AGG (text + multimodal) | ✅ | ✅ | ✅ |
| MM Routing (text + multimodal) | ✅ | ✅ | ✅ |
| P/D Disagg | not tested | ❌ OOM | ❌ hybrid KV cache |
| E_PD | ❌ transformers | ❌ transformers | ❌ transformers |
| E_P_D | ❌ transformers | ❌ transformers | ❌ transformers |

**Bottom line:** `dynamo.vllm` supports Qwen3.5 for **AGG** and **MM Routing** topologies. Disaggregated topologies (P/D, E_PD, E_P_D) have blockers that need upstream fixes in vLLM and/or transformers.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Consider the appropriate location and format for this documentation.

This file appears to be test results documentation with temporal/environment-specific details (specific test dates, container images, local log paths, hardware configurations). Consider whether:

  1. Should this be permanent documentation? If yes:

    • Move to docs/qwen35-support.md or similar
    • Generalize the content (remove specific dates/images, focus on support status and known limitations)
    • Update regularly as support evolves
  2. Or is this temporary test notes? If yes:

    • Consider adding as a comment to this PR or related issue instead of checking in
    • Or move to docs/testing/ or similar directory for historical test records
    • Or create a GitHub issue to track Qwen3.5 support status with this content
  3. Root-level placement: Test results typically belong in docs/, tests/, or similar subdirectories rather than at the repository root.

The pipeline failure (file not covered by CI filter) also suggests the CI system doesn't expect markdown files at this location.

Would you like me to help restructure this into permanent documentation or suggest where it should be placed?

🧰 Tools
🪛 GitHub Actions: Pre Merge

[error] 1-1: CI coverage filter check failed: The following files are not covered by any CI filter (UNCOVERED was non-empty, but the specific file list was not shown in the provided logs).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test-qwen35-results.md` around lines 1 - 84, This test-results markdown
belongs in docs or testing artifacts rather than the repo root; either convert
it into a permanent doc at docs/qwen35-support.md (generalize/remove dates,
container/image tags, local log paths and machine-specific GPU details, keep
topology matrix and root cause bullets referencing
components/src/dynamo/vllm/multimodal_utils/model.py and the vLLM
kv_cache_utils.py hybrid-KV note) or move it to
docs/testing/qwen35-2026-03-25.md as an archived test record (keep full
environment details there) and delete the root-level file; also update CI file
filters to allow the chosen docs path so the pipeline no longer flags it as
unexpected.

Comment thread test-qwen35-results.md
@@ -0,0 +1,84 @@
# dynamo.vllm Qwen3.5 Test Results
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Address the CI coverage filter failure.

The pipeline failed because test-qwen35-results.md is not covered by any CI filter. This is likely due to the file being at the repository root rather than in an expected location like docs/ or tests/.

Resolving the file placement issue (per previous comment) should also resolve this CI failure.

🧰 Tools
🪛 GitHub Actions: Pre Merge

[error] 1-1: CI coverage filter check failed: The following files are not covered by any CI filter (UNCOVERED was non-empty, but the specific file list was not shown in the provided logs).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test-qwen35-results.md` at line 1, The CI failed because
test-qwen35-results.md sits outside the repo locations covered by the CI
path/coverage filters; move test-qwen35-results.md into an expected location
(e.g., docs/ or tests/) and/or update the CI's path filter configuration to
include this filename or its new directory (adjust the CI workflow's path
include patterns or coverage filter settings) so the file is picked up by the
pipeline.

@pull-request-size pull-request-size Bot added size/L and removed size/M labels Mar 26, 2026
@furionw furionw marked this pull request as draft April 6, 2026 16:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant