refactor(vLLM): Move video support from example to backend by rmccorm4 · Pull Request #7663 · ai-dynamo/dynamo

rmccorm4 · 2026-03-27T07:22:25Z

Overview:

replace model-name allowlists with capability-driven vision loading and multimodal handling
add native video_url loading in the standard TokensPrompt multi_modal_data flow
move the video agg/disagg launch scripts under examples/backends/vllm and update docs/tests
Remove old Llava video model support for simplicity until explicitly requested

Details:

Quick Benchmark: Dynamo vs vllm serve for Video Inference

I ran a quick apples-to-apples comparison between Dynamo aggregate mode (examples/backends/vllm/launch/video_agg.sh) and plain vllm serve, both serving Qwen/Qwen2-VL-2B-Instruct on the same machine and GPU configuration.

Dynamo command:

bash examples/backends/vllm/launch/video_agg.sh \
    --model Qwen/Qwen3-VL-2B-Instruct

vllm serve command:

vllm serve \
    --model Qwen/Qwen3-VL-2B-Instruct \
    --served-model-name Qwen/Qwen3-VL-2B-Instruct \
    --host 0.0.0.0 \
    --port 8000 \
    --max-model-len 8192 \
    --allowed-local-media-path / \
    --limit-mm-per-prompt '{"video":1}' \
    --media-io-kwargs '{"video":{"num_frames":32}}'

Benchmark command:

aiperf profile \
    --model Qwen/Qwen3-VL-2B-Instruct \
    --endpoint-type chat \
    --endpoint /v1/chat/completions \
    --url localhost:8000 \
    --video-width 640 \
    --video-height 480 \
    --video-fps 4 \
    --video-duration 5.0 \
    --video-format mp4 \
    --video-codec libx264 \
    --request-count 20 \
    --concurrency 1 \
    --osl 1200 \
    --osl-stddev 0 \
    --extra-inputs '{"ignore_eos": true, "min_tokens": 1200}' \
    --use-server-token-count \
    --ui none \
    --no-server-metrics \
    --no-gpu-telemetry

Both runs completed successfully with identical prompt/completion lengths:

Average ISL: 962
Average OSL: 1200
Success rate: 20/20

Deployment	Concurrency	Avg latency	Req/s	Output tok/s	Benchmark duration
vLLM serve	1	6459.49 ms	0.1548	185.72	129.23 s
Dynamo + vLLM	1	6483.57 ms	0.1542	185.02	129.71 s
vLLM serve	2	7295.91 ms	0.2740	328.85	72.98 s
Dynamo + vLLM	2	7341.10 ms	0.2724	326.82	73.43 s

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

Future

Can probably condense image and video bash scripts into single vision script as the Qwen3-VL model should work for both cases

Summary by CodeRabbit

Release Notes

New Features
- Video input support now integrated into vLLM multimodal backend with configurable frame sampling
- Unified video serving available for both aggregated and disaggregated inference modes
Documentation
- Updated multimodal documentation to reflect video support in vLLM backend
- Added launch examples for video-enabled deployments
Chores
- Removed legacy video encoding components
- Updated example configurations to use standardized video infrastructure

- replace model-name allowlists with capability-driven vision loading and multimodal handling - add native video_url loading in the standard TokensPrompt multi_modal_data flow - move the video agg/disagg launch scripts under examples/backends/vllm and update docs/tests

…direct to backend

github-actions · 2026-03-27T07:24:02Z

🌿 Fern Docs Preview: https://nvidia-preview-c855d5ab-4ecb-45d1-b97b-b1d16d12fa36.docs.buildwithfern.com/dynamo/dev

…nto rmccormick/vllm-video

rmccorm4 · 2026-04-01T00:21:10Z

/ok to test 3057755

krishung5

LGTM! Left two minor comments which are not blocking. Great to to see the quick benchmark result for the video pipeline.

… scripts

Refactor AudioLoader to delegate to vLLM's MediaConnector + AudioMediaIO, matching the VideoLoader pattern from PR #7663. Returns (waveform, sample_rate) tuples at native sample rate so vLLM's model-specific MultiModalDataParser handles resampling and normalization downstream. Integrate AudioLoader into BaseWorkerHandler._extract_multimodal_data() so audio_url content parts flow through to vLLM's engine for omni models (Qwen3-Omni, Nemotron Omni). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Delete examples/multimodal/utils/audio_loader.py — the backend AudioLoader in components/ now handles all audio loading. Update the example encode worker import to use the components package. Matches the pattern from PR #7663 which removed the example video loader when video support moved into the backend. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Refactor AudioLoader to delegate to vLLM's MediaConnector + AudioMediaIO, matching the VideoLoader pattern from PR #7663. Returns (waveform, sample_rate) tuples at native sample rate so vLLM's model-specific MultiModalDataParser handles resampling and normalization downstream. Integrate AudioLoader into BaseWorkerHandler._extract_multimodal_data() so audio_url content parts flow through to vLLM's engine for omni models (Qwen3-Omni, Nemotron Omni). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Delete examples/multimodal/utils/audio_loader.py — the backend AudioLoader in components/ now handles all audio loading. Update the example encode worker import to use the components package. Matches the pattern from PR #7663 which removed the example video loader when video support moved into the backend. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Refactor AudioLoader to delegate to vLLM's MediaConnector + AudioMediaIO, matching the VideoLoader pattern from PR #7663. Returns (waveform, sample_rate) tuples at native sample rate so vLLM's model-specific MultiModalDataParser handles resampling and normalization downstream. Integrate AudioLoader into BaseWorkerHandler._extract_multimodal_data() so audio_url content parts flow through to vLLM's engine for omni models (Qwen3-Omni, Nemotron Omni). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Delete examples/multimodal/utils/audio_loader.py — the backend AudioLoader in components/ now handles all audio loading. Update the example encode worker import to use the components package. Matches the pattern from PR #7663 which removed the example video loader when video support moved into the backend. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Refactor AudioLoader to delegate to vLLM's MediaConnector + AudioMediaIO, matching the VideoLoader pattern from PR #7663. Returns (waveform, sample_rate) tuples at native sample rate so vLLM's model-specific MultiModalDataParser handles resampling and normalization downstream. Integrate AudioLoader into BaseWorkerHandler._extract_multimodal_data() so audio_url content parts flow through to vLLM's engine for omni models (Qwen3-Omni, Nemotron Omni). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Delete examples/multimodal/utils/audio_loader.py — the backend AudioLoader in components/ now handles all audio loading. Update the example encode worker import to use the components package. Matches the pattern from PR #7663 which removed the example video loader when video support moved into the backend. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

rmccorm4 added 2 commits March 26, 2026 22:23

simplify - remove prefetch, return error in mm example on image to re…

968fc8c

…direct to backend

pull-request-size Bot added the size/XXL label Mar 27, 2026

github-actions Bot added refactor documentation Improvements or additions to documentation backend::vllm Relates to the vllm backend multimodal labels Mar 27, 2026

iterate

d411022

copy-pr-bot Bot temporarily deployed to GITLAB March 27, 2026 07:30 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB March 27, 2026 07:31 Inactive

simplify

4660ca6

copy-pr-bot Bot temporarily deployed to GITLAB March 27, 2026 07:40 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB March 27, 2026 07:43 Inactive

joshuayao mentioned this pull request Mar 30, 2026

feat: Add multimodal vLLM XPU aggregated serving examples #7675

Closed

fix linting

8efdbde

copy-pr-bot Bot temporarily deployed to GITLAB March 31, 2026 21:33 Inactive

Merge branch 'main' into rmccormick/vllm-video

e8c16d5

copy-pr-bot Bot temporarily deployed to GITLAB March 31, 2026 21:33 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB March 31, 2026 23:14 Inactive

rmccorm4 added 2 commits March 31, 2026 17:19

self review

2a33c78

Merge branch 'rmccormick/vllm-video' of github.com:ai-dynamo/dynamo i…

3057755

…nto rmccormick/vllm-video

copy-pr-bot Bot temporarily deployed to GITLAB April 1, 2026 00:20 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB April 1, 2026 04:39 Inactive

rmccorm4 added 2 commits April 1, 2026 17:21

Remove unused cache size and add TODO

fb60f42

Update video tests to single gpu with Qwen3 VL 2B to match image tests

b1d2a73

copy-pr-bot Bot temporarily deployed to GITLAB April 2, 2026 00:22 Inactive

rmccorm4 marked this pull request as ready for review April 2, 2026 01:15

rmccorm4 requested a review from a team as a code owner April 2, 2026 01:15

copy-pr-bot Bot temporarily deployed to GITLAB April 2, 2026 02:56 Inactive

krishung5 approved these changes Apr 2, 2026

View reviewed changes

Comment thread components/src/dynamo/common/tests/multimodal/test_video_loader.py

Comment thread examples/multimodal/utils/model.py

rmccorm4 added 2 commits April 2, 2026 12:01

review feedback: remove remaining llava video references

f72774a

review feedback: add gpu_0 mark to cpu_only test

eddeab4

copy-pr-bot Bot temporarily deployed to GITLAB April 2, 2026 19:03 Inactive

fix merge conflict with Keiven's gpu mem profiling changes in example…

ca253c1

… scripts

copy-pr-bot Bot temporarily deployed to GITLAB April 2, 2026 19:11 Inactive

update vram marker from Keiven's recent changes

057f2ba

copy-pr-bot Bot temporarily deployed to GITLAB April 2, 2026 19:48 Inactive

github-actions Bot added the backend::sglang Relates to the sglang backend label Apr 2, 2026

fix merge conflict: keiven fixed marker

6e876c4

copy-pr-bot Bot temporarily deployed to GITLAB April 2, 2026 20:11 Inactive

krishung5 approved these changes Apr 2, 2026

View reviewed changes

furionw approved these changes Apr 2, 2026

View reviewed changes

copy-pr-bot Bot had a problem deploying to GITLAB April 2, 2026 22:41 Failure

furionw reviewed Apr 2, 2026

View reviewed changes

Comment thread components/src/dynamo/common/multimodal/video_loader.py

rmccorm4 enabled auto-merge (squash) April 2, 2026 23:22

rmccorm4 merged commit 4791aaa into main Apr 2, 2026
91 of 92 checks passed

rmccorm4 deleted the rmccormick/vllm-video branch April 2, 2026 23:22

joshuayao mentioned this pull request Apr 3, 2026

chore(multimodal): Add XPU aggregated video vLLM launch example #7855

Merged

nealvaidya mentioned this pull request Apr 7, 2026

feat(vllm): add audio_url loading to multimodal handler #7955

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(vLLM): Move video support from example to backend#7663

refactor(vLLM): Move video support from example to backend#7663
rmccorm4 merged 15 commits intomainfrom
rmccormick/vllm-video

rmccorm4 commented Mar 27, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Mar 27, 2026 •

edited

Loading

Uh oh!

rmccorm4 commented Apr 1, 2026

Uh oh!

krishung5 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

rmccorm4 commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Future

Summary by CodeRabbit

Release Notes

Uh oh!

github-actions Bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rmccorm4 commented Apr 1, 2026

Uh oh!

krishung5 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rmccorm4 commented Mar 27, 2026 •

edited

Loading

github-actions Bot commented Mar 27, 2026 •

edited

Loading