refactor(vLLM): Move video support from example to backend#7663
Merged
refactor(vLLM): Move video support from example to backend#7663
Conversation
- replace model-name allowlists with capability-driven vision loading and multimodal handling - add native video_url loading in the standard TokensPrompt multi_modal_data flow - move the video agg/disagg launch scripts under examples/backends/vllm and update docs/tests
…direct to backend
Contributor
…nto rmccormick/vllm-video
Contributor
Author
|
/ok to test 3057755 |
krishung5
approved these changes
Apr 2, 2026
Contributor
krishung5
left a comment
There was a problem hiding this comment.
LGTM! Left two minor comments which are not blocking. Great to to see the quick benchmark result for the video pipeline.
krishung5
approved these changes
Apr 2, 2026
furionw
approved these changes
Apr 2, 2026
furionw
reviewed
Apr 2, 2026
nealvaidya
added a commit
that referenced
this pull request
Apr 7, 2026
Refactor AudioLoader to delegate to vLLM's MediaConnector + AudioMediaIO, matching the VideoLoader pattern from PR #7663. Returns (waveform, sample_rate) tuples at native sample rate so vLLM's model-specific MultiModalDataParser handles resampling and normalization downstream. Integrate AudioLoader into BaseWorkerHandler._extract_multimodal_data() so audio_url content parts flow through to vLLM's engine for omni models (Qwen3-Omni, Nemotron Omni). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
6 tasks
nealvaidya
added a commit
that referenced
this pull request
Apr 7, 2026
Delete examples/multimodal/utils/audio_loader.py — the backend AudioLoader in components/ now handles all audio loading. Update the example encode worker import to use the components package. Matches the pattern from PR #7663 which removed the example video loader when video support moved into the backend. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
nealvaidya
added a commit
that referenced
this pull request
Apr 7, 2026
Delete examples/multimodal/utils/audio_loader.py — the backend AudioLoader in components/ now handles all audio loading. Update the example encode worker import to use the components package. Matches the pattern from PR #7663 which removed the example video loader when video support moved into the backend. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
nealvaidya
added a commit
that referenced
this pull request
Apr 7, 2026
Refactor AudioLoader to delegate to vLLM's MediaConnector + AudioMediaIO, matching the VideoLoader pattern from PR #7663. Returns (waveform, sample_rate) tuples at native sample rate so vLLM's model-specific MultiModalDataParser handles resampling and normalization downstream. Integrate AudioLoader into BaseWorkerHandler._extract_multimodal_data() so audio_url content parts flow through to vLLM's engine for omni models (Qwen3-Omni, Nemotron Omni). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
nealvaidya
added a commit
that referenced
this pull request
Apr 7, 2026
Delete examples/multimodal/utils/audio_loader.py — the backend AudioLoader in components/ now handles all audio loading. Update the example encode worker import to use the components package. Matches the pattern from PR #7663 which removed the example video loader when video support moved into the backend. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
nealvaidya
added a commit
that referenced
this pull request
Apr 7, 2026
Refactor AudioLoader to delegate to vLLM's MediaConnector + AudioMediaIO, matching the VideoLoader pattern from PR #7663. Returns (waveform, sample_rate) tuples at native sample rate so vLLM's model-specific MultiModalDataParser handles resampling and normalization downstream. Integrate AudioLoader into BaseWorkerHandler._extract_multimodal_data() so audio_url content parts flow through to vLLM's engine for omni models (Qwen3-Omni, Nemotron Omni). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
nealvaidya
added a commit
that referenced
this pull request
Apr 7, 2026
Delete examples/multimodal/utils/audio_loader.py — the backend AudioLoader in components/ now handles all audio loading. Update the example encode worker import to use the components package. Matches the pattern from PR #7663 which removed the example video loader when video support moved into the backend. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
nealvaidya
added a commit
that referenced
this pull request
Apr 9, 2026
Refactor AudioLoader to delegate to vLLM's MediaConnector + AudioMediaIO, matching the VideoLoader pattern from PR #7663. Returns (waveform, sample_rate) tuples at native sample rate so vLLM's model-specific MultiModalDataParser handles resampling and normalization downstream. Integrate AudioLoader into BaseWorkerHandler._extract_multimodal_data() so audio_url content parts flow through to vLLM's engine for omni models (Qwen3-Omni, Nemotron Omni). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
nealvaidya
added a commit
that referenced
this pull request
Apr 9, 2026
Delete examples/multimodal/utils/audio_loader.py — the backend AudioLoader in components/ now handles all audio loading. Update the example encode worker import to use the components package. Matches the pattern from PR #7663 which removed the example video loader when video support moved into the backend. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview:
Details:
Quick Benchmark: Dynamo vs
vllm servefor Video InferenceI ran a quick apples-to-apples comparison between Dynamo aggregate mode (
examples/backends/vllm/launch/video_agg.sh) and plainvllm serve, both servingQwen/Qwen2-VL-2B-Instructon the same machine and GPU configuration.Dynamo command:
vllm serve command:
Benchmark command:
aiperf profile \ --model Qwen/Qwen3-VL-2B-Instruct \ --endpoint-type chat \ --endpoint /v1/chat/completions \ --url localhost:8000 \ --video-width 640 \ --video-height 480 \ --video-fps 4 \ --video-duration 5.0 \ --video-format mp4 \ --video-codec libx264 \ --request-count 20 \ --concurrency 1 \ --osl 1200 \ --osl-stddev 0 \ --extra-inputs '{"ignore_eos": true, "min_tokens": 1200}' \ --use-server-token-count \ --ui none \ --no-server-metrics \ --no-gpu-telemetryBoth runs completed successfully with identical prompt/completion lengths:
962120020/20Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Future
Summary by CodeRabbit
Release Notes
New Features
Documentation
Chores