Conversation
WalkthroughAdds a new markdown document recording dynamo vLLM test results for Qwen3.5 models across multiple sizes and GPU topology configurations, including pass/fail outcomes, identified blockers, and support limitations. Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~5 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
test-qwen35-results.md (1)
18-18: Local log paths are not accessible to other users.The referenced log path
dynamo/logs/test-qwen35/appears to be local to your environment. Consider either:
- Including relevant logs in the repository
- Uploading logs to a shared location (e.g., CI artifacts, cloud storage)
- Adding a note that logs are available upon request
- Or acknowledging that logs are local-only for reference
This also applies to lines 31 and 45.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test-qwen35-results.md` at line 18, The README/reference to the local log path `dynamo/logs/test-qwen35/` is not accessible to others; update the file to either (a) include the relevant log excerpts inline, (b) upload the full logs to a shared location (CI artifacts or cloud storage) and replace the local path with the public link, or (c) add a clear note next to `dynamo/logs/test-qwen35/` (and the similar entries at the other occurrences) stating that the logs are local-only and available on request, so reviewers know how to access them.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test-qwen35-results.md`:
- Around line 1-84: This test-results markdown belongs in docs or testing
artifacts rather than the repo root; either convert it into a permanent doc at
docs/qwen35-support.md (generalize/remove dates, container/image tags, local log
paths and machine-specific GPU details, keep topology matrix and root cause
bullets referencing components/src/dynamo/vllm/multimodal_utils/model.py and the
vLLM kv_cache_utils.py hybrid-KV note) or move it to
docs/testing/qwen35-2026-03-25.md as an archived test record (keep full
environment details there) and delete the root-level file; also update CI file
filters to allow the chosen docs path so the pipeline no longer flags it as
unexpected.
- Line 1: The CI failed because test-qwen35-results.md sits outside the repo
locations covered by the CI path/coverage filters; move test-qwen35-results.md
into an expected location (e.g., docs/ or tests/) and/or update the CI's path
filter configuration to include this filename or its new directory (adjust the
CI workflow's path include patterns or coverage filter settings) so the file is
picked up by the pipeline.
---
Nitpick comments:
In `@test-qwen35-results.md`:
- Line 18: The README/reference to the local log path `dynamo/logs/test-qwen35/`
is not accessible to others; update the file to either (a) include the relevant
log excerpts inline, (b) upload the full logs to a shared location (CI artifacts
or cloud storage) and replace the local path with the public link, or (c) add a
clear note next to `dynamo/logs/test-qwen35/` (and the similar entries at the
other occurrences) stating that the logs are local-only and available on
request, so reviewers know how to access them.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e4cb2474-45fe-4d31-81bd-ac33c16be6a3
📒 Files selected for processing (1)
test-qwen35-results.md
| # dynamo.vllm Qwen3.5 Test Results | ||
|
|
||
| **Date:** 2026-03-25/26 | ||
| **Image:** `dynamo:latest-vllm-local-dev-03-25` | ||
| **Host GPUs:** GPU 0 (A400 4GB, unused), GPU 1 (RTX 6000 Ada 49GB), GPU 2 (RTX PRO 6000 Blackwell 98GB) | ||
|
|
||
| ## Key Findings | ||
|
|
||
| 1. **Qwen3.5 is multimodal** — all variants (2B, 27B, 35B-A3B-FP8) handle vision inputs natively | ||
| 2. **AGG and MM Routing work across all sizes** — confirmed for 2B, 27B, and 35B-A3B-FP8 | ||
| 3. **P/D Disagg blocked by vLLM hybrid KV cache** — Qwen3.5's hybrid architecture (attention + Mamba/GDN layers) is incompatible with `--kv-transfer-config` which disables the hybrid KV cache manager | ||
| 4. **E_PD/E_P_D blocked by outdated `transformers`** — standalone encode worker's `AutoModel.from_pretrained()` doesn't recognize `qwen3_5` | ||
|
|
||
| --- | ||
|
|
||
| ## Qwen/Qwen3.5-2B | ||
|
|
||
| Logs: `dynamo/logs/test-qwen35/` | ||
|
|
||
| | Topology | Status | Notes | | ||
| |----------|--------|-------| | ||
| | **AGG** | **PASS** | Text + multimodal both work | | ||
| | **MM Routing** | **PASS** | Text + multimodal through KV-aware router | | ||
| | **E_PD** | **FAIL** | `transformers` KeyError: `qwen3_5` | | ||
| | **E_P_D** | **FAIL** | Same | | ||
|
|
||
| --- | ||
|
|
||
| ## Qwen/Qwen3.5-27B | ||
|
|
||
| Logs: `dynamo/logs/test-qwen35-27b/` | ||
|
|
||
| | Topology | Status | Notes | | ||
| |----------|--------|-------| | ||
| | **AGG** | **PASS** | Text + multimodal. 51.1 GiB on 98GB Blackwell | | ||
| | **MM Routing** | **PASS** | Text + multimodal. First request ~92s cold start | | ||
| | **P/D Disagg** | **FAIL (OOM)** | 27B bf16 (54GB) doesn't fit on 49GB Ada | | ||
| | **E_PD** | **FAIL** | `transformers` KeyError: `qwen3_5` | | ||
| | **E_P_D** | **FAIL** | Same (confirmed from 2B, not re-run) | | ||
|
|
||
| --- | ||
|
|
||
| ## Qwen/Qwen3.5-35B-A3B-FP8 (MoE) | ||
|
|
||
| Logs: `dynamo/logs/test-qwen35-35b/` | ||
|
|
||
| | Topology | Status | Notes | | ||
| |----------|--------|-------| | ||
| | **AGG** | **PASS** | Text + multimodal. 34.23 GiB FP8 on 98GB Blackwell | | ||
| | **MM Routing** | **PASS** | Text + multimodal. Model served as `__internal` name but requests succeeded | | ||
| | **P/D Disagg** | **FAIL** | vLLM error: "Hybrid KV cache manager is disabled but failed to convert KV cache specs to one unified type." Qwen3.5 MoE's hybrid arch (attention + Mamba) incompatible with `--kv-transfer-config` | | ||
| | **E_PD** | **FAIL** | `transformers` KeyError: `qwen3_5` (expected) | | ||
| | **E_P_D** | **FAIL** | Same (expected) | | ||
|
|
||
| --- | ||
|
|
||
| ## Root Causes | ||
|
|
||
| ### 1. E_PD / E_P_D: `transformers` doesn't support `qwen3_5` | ||
| - **Where:** `components/src/dynamo/vllm/multimodal_utils/model.py` → `AutoModel.from_pretrained()` | ||
| - **Fix:** Upgrade transformers, or use vLLM's native encoder path with Qwen3.5 added to `SupportedModels` | ||
|
|
||
| ### 2. P/D Disagg: Hybrid KV cache incompatibility | ||
| - **Where:** vLLM `kv_cache_utils.py:1172` — hybrid KV cache manager disabled by `--kv-transfer-config` | ||
| - **Root cause:** Qwen3.5 has hybrid attention layers (standard attention + GatedDeltaNet/Mamba), requiring the hybrid KV cache manager. But `--kv-transfer-config` (required for P/D disagg with NixlConnector) forces it off. | ||
| - **Fix:** vLLM needs to support hybrid KV cache + KV transfer together, or the NixlConnector needs to handle heterogeneous KV cache specs | ||
|
|
||
| ### 3. P/D Disagg OOM (27B only): Hardware limitation | ||
| - **Where:** 27B bf16 needs ~54GB, GPU 1 only has 49GB | ||
| - **Fix:** Use FP8 quantized variant or larger GPUs | ||
|
|
||
| --- | ||
|
|
||
| ## Summary Matrix | ||
|
|
||
| | Topology | 2B | 27B | 35B-A3B-FP8 | | ||
| |----------|-----|------|-------------| | ||
| | AGG (text + multimodal) | ✅ | ✅ | ✅ | | ||
| | MM Routing (text + multimodal) | ✅ | ✅ | ✅ | | ||
| | P/D Disagg | not tested | ❌ OOM | ❌ hybrid KV cache | | ||
| | E_PD | ❌ transformers | ❌ transformers | ❌ transformers | | ||
| | E_P_D | ❌ transformers | ❌ transformers | ❌ transformers | | ||
|
|
||
| **Bottom line:** `dynamo.vllm` supports Qwen3.5 for **AGG** and **MM Routing** topologies. Disaggregated topologies (P/D, E_PD, E_P_D) have blockers that need upstream fixes in vLLM and/or transformers. |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Consider the appropriate location and format for this documentation.
This file appears to be test results documentation with temporal/environment-specific details (specific test dates, container images, local log paths, hardware configurations). Consider whether:
-
Should this be permanent documentation? If yes:
- Move to
docs/qwen35-support.mdor similar - Generalize the content (remove specific dates/images, focus on support status and known limitations)
- Update regularly as support evolves
- Move to
-
Or is this temporary test notes? If yes:
- Consider adding as a comment to this PR or related issue instead of checking in
- Or move to
docs/testing/or similar directory for historical test records - Or create a GitHub issue to track Qwen3.5 support status with this content
-
Root-level placement: Test results typically belong in
docs/,tests/, or similar subdirectories rather than at the repository root.
The pipeline failure (file not covered by CI filter) also suggests the CI system doesn't expect markdown files at this location.
Would you like me to help restructure this into permanent documentation or suggest where it should be placed?
🧰 Tools
🪛 GitHub Actions: Pre Merge
[error] 1-1: CI coverage filter check failed: The following files are not covered by any CI filter (UNCOVERED was non-empty, but the specific file list was not shown in the provided logs).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test-qwen35-results.md` around lines 1 - 84, This test-results markdown
belongs in docs or testing artifacts rather than the repo root; either convert
it into a permanent doc at docs/qwen35-support.md (generalize/remove dates,
container/image tags, local log paths and machine-specific GPU details, keep
topology matrix and root cause bullets referencing
components/src/dynamo/vllm/multimodal_utils/model.py and the vLLM
kv_cache_utils.py hybrid-KV note) or move it to
docs/testing/qwen35-2026-03-25.md as an archived test record (keep full
environment details there) and delete the root-level file; also update CI file
filters to allow the chosen docs path so the pipeline no longer flags it as
unexpected.
| @@ -0,0 +1,84 @@ | |||
| # dynamo.vllm Qwen3.5 Test Results | |||
There was a problem hiding this comment.
Address the CI coverage filter failure.
The pipeline failed because test-qwen35-results.md is not covered by any CI filter. This is likely due to the file being at the repository root rather than in an expected location like docs/ or tests/.
Resolving the file placement issue (per previous comment) should also resolve this CI failure.
🧰 Tools
🪛 GitHub Actions: Pre Merge
[error] 1-1: CI coverage filter check failed: The following files are not covered by any CI filter (UNCOVERED was non-empty, but the specific file list was not shown in the provided logs).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test-qwen35-results.md` at line 1, The CI failed because
test-qwen35-results.md sits outside the repo locations covered by the CI
path/coverage filters; move test-qwen35-results.md into an expected location
(e.g., docs/ or tests/) and/or update the CI's path filter configuration to
include this filename or its new directory (adjust the CI workflow's path
include patterns or coverage filter settings) so the file is picked up by the
pipeline.
WIP
Summary by CodeRabbit