[WIP] feat: qwen3.5 support by furionw · Pull Request #7655 · ai-dynamo/dynamo

furionw · 2026-03-26T04:43:16Z

WIP

Summary by CodeRabbit

Documentation
- Added comprehensive test results documentation for Qwen3.5 models across various configurations and GPU topology setups.
- Documents identified limitations and known blockers for specific deployment scenarios.

github-actions · 2026-03-26T04:44:55Z

🌿 Fern Docs Preview: https://nvidia-preview-e2b52c3f-003c-4019-a452-8b76dcedaff7.docs.buildwithfern.com/dynamo/dev

coderabbitai · 2026-03-26T04:47:13Z

Walkthrough

Adds a new markdown document recording dynamo vLLM test results for Qwen3.5 models across multiple sizes and GPU topology configurations, including pass/fail outcomes, identified blockers, and support limitations.

Changes

Cohort / File(s)	Summary
Documentation `test-qwen35-results.md`	New markdown file documenting dynamo vLLM test results for Qwen3.5 models (2B, 27B, 35B-A3B-FP8) across various GPU topologies, including identified blockers with transformers KeyError and vLLM KV cache incompatibility, and a consolidated summary restricting support to AGG and MM Routing topologies.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The pull request description is critically incomplete. It contains only 'WIP' with no details about changes, overview, or related issues. The required template sections (Overview, Details, Where to start, Related Issues) are entirely missing.	Complete the description by filling in all required template sections: overview of what's being added, details about the test results document, which files to review, and any related GitHub issues.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title '[WIP] feat: qwen3.5 support' is partially related to the changeset. It indicates qwen3.5 support work, which aligns with the test results document added, but uses '[WIP]' and is overly broad without specifying that this documents test results rather than implementing the feature.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

test-qwen35-results.md (1)
18-18: Local log paths are not accessible to other users.

The referenced log path dynamo/logs/test-qwen35/ appears to be local to your environment. Consider either:

Including relevant logs in the repository

Uploading logs to a shared location (e.g., CI artifacts, cloud storage)

Adding a note that logs are available upon request

Or acknowledging that logs are local-only for reference

This also applies to lines 31 and 45.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test-qwen35-results.md` at line 18, The README/reference to the local log
path `dynamo/logs/test-qwen35/` is not accessible to others; update the file to
either (a) include the relevant log excerpts inline, (b) upload the full logs to
a shared location (CI artifacts or cloud storage) and replace the local path
with the public link, or (c) add a clear note next to `dynamo/logs/test-qwen35/`
(and the similar entries at the other occurrences) stating that the logs are
local-only and available on request, so reviewers know how to access them.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test-qwen35-results.md`:
- Around line 1-84: This test-results markdown belongs in docs or testing
artifacts rather than the repo root; either convert it into a permanent doc at
docs/qwen35-support.md (generalize/remove dates, container/image tags, local log
paths and machine-specific GPU details, keep topology matrix and root cause
bullets referencing components/src/dynamo/vllm/multimodal_utils/model.py and the
vLLM kv_cache_utils.py hybrid-KV note) or move it to
docs/testing/qwen35-2026-03-25.md as an archived test record (keep full
environment details there) and delete the root-level file; also update CI file
filters to allow the chosen docs path so the pipeline no longer flags it as
unexpected.
- Line 1: The CI failed because test-qwen35-results.md sits outside the repo
locations covered by the CI path/coverage filters; move test-qwen35-results.md
into an expected location (e.g., docs/ or tests/) and/or update the CI's path
filter configuration to include this filename or its new directory (adjust the
CI workflow's path include patterns or coverage filter settings) so the file is
picked up by the pipeline.

---

Nitpick comments:
In `@test-qwen35-results.md`:
- Line 18: The README/reference to the local log path `dynamo/logs/test-qwen35/`
is not accessible to others; update the file to either (a) include the relevant
log excerpts inline, (b) upload the full logs to a shared location (CI artifacts
or cloud storage) and replace the local path with the public link, or (c) add a
clear note next to `dynamo/logs/test-qwen35/` (and the similar entries at the
other occurrences) stating that the logs are local-only and available on
request, so reviewers know how to access them.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e4cb2474-45fe-4d31-81bd-ac33c16be6a3

📥 Commits

Reviewing files that changed from the base of the PR and between a58bcc3 and acd64e4.

📒 Files selected for processing (1)

test-qwen35-results.md

coderabbitai · 2026-03-26T04:47:16Z

+# dynamo.vllm Qwen3.5 Test Results
+
+**Date:** 2026-03-25/26
+**Image:** `dynamo:latest-vllm-local-dev-03-25`
+**Host GPUs:** GPU 0 (A400 4GB, unused), GPU 1 (RTX 6000 Ada 49GB), GPU 2 (RTX PRO 6000 Blackwell 98GB)
+
+## Key Findings
+
+1. **Qwen3.5 is multimodal** — all variants (2B, 27B, 35B-A3B-FP8) handle vision inputs natively
+2. **AGG and MM Routing work across all sizes** — confirmed for 2B, 27B, and 35B-A3B-FP8
+3. **P/D Disagg blocked by vLLM hybrid KV cache** — Qwen3.5's hybrid architecture (attention + Mamba/GDN layers) is incompatible with `--kv-transfer-config` which disables the hybrid KV cache manager
+4. **E_PD/E_P_D blocked by outdated `transformers`** — standalone encode worker's `AutoModel.from_pretrained()` doesn't recognize `qwen3_5`
+
+---
+
+## Qwen/Qwen3.5-2B
+
+Logs: `dynamo/logs/test-qwen35/`
+
+| Topology | Status | Notes |
+|----------|--------|-------|
+| **AGG** | **PASS** | Text + multimodal both work |
+| **MM Routing** | **PASS** | Text + multimodal through KV-aware router |
+| **E_PD** | **FAIL** | `transformers` KeyError: `qwen3_5` |
+| **E_P_D** | **FAIL** | Same |
+
+---
+
+## Qwen/Qwen3.5-27B
+
+Logs: `dynamo/logs/test-qwen35-27b/`
+
+| Topology | Status | Notes |
+|----------|--------|-------|
+| **AGG** | **PASS** | Text + multimodal. 51.1 GiB on 98GB Blackwell |
+| **MM Routing** | **PASS** | Text + multimodal. First request ~92s cold start |
+| **P/D Disagg** | **FAIL (OOM)** | 27B bf16 (54GB) doesn't fit on 49GB Ada |
+| **E_PD** | **FAIL** | `transformers` KeyError: `qwen3_5` |
+| **E_P_D** | **FAIL** | Same (confirmed from 2B, not re-run) |
+
+---
+
+## Qwen/Qwen3.5-35B-A3B-FP8 (MoE)
+
+Logs: `dynamo/logs/test-qwen35-35b/`
+
+| Topology | Status | Notes |
+|----------|--------|-------|
+| **AGG** | **PASS** | Text + multimodal. 34.23 GiB FP8 on 98GB Blackwell |
+| **MM Routing** | **PASS** | Text + multimodal. Model served as `__internal` name but requests succeeded |
+| **P/D Disagg** | **FAIL** | vLLM error: "Hybrid KV cache manager is disabled but failed to convert KV cache specs to one unified type." Qwen3.5 MoE's hybrid arch (attention + Mamba) incompatible with `--kv-transfer-config` |
+| **E_PD** | **FAIL** | `transformers` KeyError: `qwen3_5` (expected) |
+| **E_P_D** | **FAIL** | Same (expected) |
+
+---
+
+## Root Causes
+
+### 1. E_PD / E_P_D: `transformers` doesn't support `qwen3_5`
+- **Where:** `components/src/dynamo/vllm/multimodal_utils/model.py` → `AutoModel.from_pretrained()`
+- **Fix:** Upgrade transformers, or use vLLM's native encoder path with Qwen3.5 added to `SupportedModels`
+
+### 2. P/D Disagg: Hybrid KV cache incompatibility
+- **Where:** vLLM `kv_cache_utils.py:1172` — hybrid KV cache manager disabled by `--kv-transfer-config`
+- **Root cause:** Qwen3.5 has hybrid attention layers (standard attention + GatedDeltaNet/Mamba), requiring the hybrid KV cache manager. But `--kv-transfer-config` (required for P/D disagg with NixlConnector) forces it off.
+- **Fix:** vLLM needs to support hybrid KV cache + KV transfer together, or the NixlConnector needs to handle heterogeneous KV cache specs
+
+### 3. P/D Disagg OOM (27B only): Hardware limitation
+- **Where:** 27B bf16 needs ~54GB, GPU 1 only has 49GB
+- **Fix:** Use FP8 quantized variant or larger GPUs
+
+---
+
+## Summary Matrix
+
+| Topology | 2B | 27B | 35B-A3B-FP8 |
+|----------|-----|------|-------------|
+| AGG (text + multimodal) | ✅ | ✅ | ✅ |
+| MM Routing (text + multimodal) | ✅ | ✅ | ✅ |
+| P/D Disagg | not tested | ❌ OOM | ❌ hybrid KV cache |
+| E_PD | ❌ transformers | ❌ transformers | ❌ transformers |
+| E_P_D | ❌ transformers | ❌ transformers | ❌ transformers |
+
+**Bottom line:** `dynamo.vllm` supports Qwen3.5 for **AGG** and **MM Routing** topologies. Disaggregated topologies (P/D, E_PD, E_P_D) have blockers that need upstream fixes in vLLM and/or transformers.


🛠️ Refactor suggestion | 🟠 Major

Consider the appropriate location and format for this documentation.

This file appears to be test results documentation with temporal/environment-specific details (specific test dates, container images, local log paths, hardware configurations). Consider whether:

Should this be permanent documentation? If yes:

Move to docs/qwen35-support.md or similar

Generalize the content (remove specific dates/images, focus on support status and known limitations)

Update regularly as support evolves

Or is this temporary test notes? If yes:

Consider adding as a comment to this PR or related issue instead of checking in

Or move to docs/testing/ or similar directory for historical test records

Or create a GitHub issue to track Qwen3.5 support status with this content

Root-level placement: Test results typically belong in docs/, tests/, or similar subdirectories rather than at the repository root.

The pipeline failure (file not covered by CI filter) also suggests the CI system doesn't expect markdown files at this location.

Would you like me to help restructure this into permanent documentation or suggest where it should be placed?

🧰 Tools

🪛 GitHub Actions: Pre Merge

[error] 1-1: CI coverage filter check failed: The following files are not covered by any CI filter (UNCOVERED was non-empty, but the specific file list was not shown in the provided logs).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test-qwen35-results.md` around lines 1 - 84, This test-results markdown belongs in docs or testing artifacts rather than the repo root; either convert it into a permanent doc at docs/qwen35-support.md (generalize/remove dates, container/image tags, local log paths and machine-specific GPU details, keep topology matrix and root cause bullets referencing components/src/dynamo/vllm/multimodal_utils/model.py and the vLLM kv_cache_utils.py hybrid-KV note) or move it to docs/testing/qwen35-2026-03-25.md as an archived test record (keep full environment details there) and delete the root-level file; also update CI file filters to allow the chosen docs path so the pipeline no longer flags it as unexpected.

coderabbitai · 2026-03-26T04:47:16Z

@@ -0,0 +1,84 @@
+# dynamo.vllm Qwen3.5 Test Results


⚠️ Potential issue | 🟡 Minor

Address the CI coverage filter failure.

The pipeline failed because test-qwen35-results.md is not covered by any CI filter. This is likely due to the file being at the repository root rather than in an expected location like docs/ or tests/.

Resolving the file placement issue (per previous comment) should also resolve this CI failure.

🧰 Tools

🪛 GitHub Actions: Pre Merge

[error] 1-1: CI coverage filter check failed: The following files are not covered by any CI filter (UNCOVERED was non-empty, but the specific file list was not shown in the provided logs).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test-qwen35-results.md` at line 1, The CI failed because test-qwen35-results.md sits outside the repo locations covered by the CI path/coverage filters; move test-qwen35-results.md into an expected location (e.g., docs/ or tests/) and/or update the CI's path filter configuration to include this filename or its new directory (adjust the CI workflow's path include patterns or coverage filter settings) so the file is picked up by the pipeline.

feat: qwen3.5 support

acd64e4

pull-request-size Bot added the size/M label Mar 26, 2026

github-actions Bot added the documentation Improvements or additions to documentation label Mar 26, 2026

coderabbitai Bot reviewed Mar 26, 2026

View reviewed changes

update

789ae80

pull-request-size Bot added size/L and removed size/M labels Mar 26, 2026

copy-pr-bot Bot temporarily deployed to GITLAB March 26, 2026 06:00 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB March 26, 2026 06:01 Inactive

furionw marked this pull request as draft April 6, 2026 16:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] feat: qwen3.5 support#7655

[WIP] feat: qwen3.5 support#7655
furionw wants to merge 2 commits intomainfrom
qiwa/qwen3.5

furionw commented Mar 26, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

github-actions Bot commented Mar 26, 2026

Uh oh!

coderabbitai Bot commented Mar 26, 2026

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Mar 26, 2026

Uh oh!

coderabbitai Bot Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

furionw commented Mar 26, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

github-actions Bot commented Mar 26, 2026

Uh oh!

coderabbitai Bot commented Mar 26, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

furionw commented Mar 26, 2026 •

edited by coderabbitai Bot

Loading