ci: switch vLLM containers to upstream base images#7648
ci: switch vLLM containers to upstream base images#7648alec-flowers wants to merge 17 commits intomainfrom
Conversation
WalkthroughThis pull request transitions vLLM container builds from custom-built images to upstream vllm/vllm-openai images with arch-specific runtime selection, replaces msgpack with msgspec for serialization, removes complex build scripts, and adds Ray node discovery patching while simplifying Docker image composition and dependency management. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (5)
examples/backends/vllm/launch/disagg_multimodal_epd.sh (1)
124-134: Hardcoded NIXL side-channel and ZMQ ports reduce portability.The
VLLM_NIXL_SIDE_CHANNEL_PORTvalues (20097-20099) and ZMQ endpoint ports (20080-20082) are static. If multiple test runs or instances execute concurrently (e.g., pytest-xdist), these will conflict.Consider making these configurable via environment variables similar to the
DYN_SYSTEM_PORT*pattern, or usingalloc_portfromlaunch_utils.shif available.♻️ Example pattern for configurable NIXL/ZMQ ports
+# NIXL side-channel ports (override via environment variables) +DYN_NIXL_PORT_ENCODE=${DYN_NIXL_PORT_ENCODE:-20097} +DYN_NIXL_PORT_PREFILL=${DYN_NIXL_PORT_PREFILL:-20098} +DYN_NIXL_PORT_DECODE=${DYN_NIXL_PORT_DECODE:-20099} + +# ZMQ KV event ports +DYN_ZMQ_PORT_ENCODE=${DYN_ZMQ_PORT_ENCODE:-20080} +DYN_ZMQ_PORT_PREFILL=${DYN_ZMQ_PORT_PREFILL:-20081} +DYN_ZMQ_PORT_DECODE=${DYN_ZMQ_PORT_DECODE:-20082} + echo "System ports: encode=${DYN_SYSTEM_PORT_ENCODE}, prefill=${DYN_SYSTEM_PORT_PREFILL}, decode=${DYN_SYSTEM_PORT_DECODE}" +echo "NIXL ports: encode=${DYN_NIXL_PORT_ENCODE}, prefill=${DYN_NIXL_PORT_PREFILL}, decode=${DYN_NIXL_PORT_DECODE}" -VLLM_NIXL_SIDE_CHANNEL_PORT=20097 DYN_SYSTEM_PORT=$DYN_SYSTEM_PORT_ENCODE ... +VLLM_NIXL_SIDE_CHANNEL_PORT=$DYN_NIXL_PORT_ENCODE DYN_SYSTEM_PORT=$DYN_SYSTEM_PORT_ENCODE ...Based on learnings: "Flag hard-coded portability-reducing constants in shell/scripts across the repository (e.g., fixed memory sizes, fixed memory fractions, static ports...). Prefer portable alternatives available in the repo: use alloc_port for dynamic ports..."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/backends/vllm/launch/disagg_multimodal_epd.sh` around lines 124 - 134, The script hardcodes NIXL side-channel ports (VLLM_NIXL_SIDE_CHANNEL_PORT=20097/20098/20099) and ZMQ endpoint ports (tcp://*:20080/20081/20082) which causes conflicts; change these to be allocated/configurable by using alloc_port from launch_utils.sh or environment variables (e.g., VLLM_NIXL_SIDE_CHANNEL_PORT_ENCODE/PREFILL/DECODE and VLLM_ZMQ_PORT_ENCODE/PREFILL/DECODE) before launching each worker and then reference those variables in the VLLM_NIXL_SIDE_CHANNEL_PORT and --kv-events-config endpoint strings; ensure ports are exported/initialized early (call alloc_port where available) so concurrent runs pick different ports.components/src/dynamo/vllm/handlers.py (1)
585-594: Duplicate_NodeInfoclass and patch—consider removing.This inline
_NodeInfoclass andlist_nodespatch duplicate the module-level implementation at lines 77-93. Since the module-level patch runs at import time (beforescale_elastic_epis ever called), this method-level patch is redundant.♻️ Suggested fix: Remove the duplicate inline patch
try: - # TODO(upstream-vllm): remove this patch once vLLM fixes - # add_dp_placement_groups in vllm/v1/engine/utils.py to use ray.nodes() - # instead of ray.util.state.list_nodes(). - # - # Patch ray.util.state.list_nodes to use the GCS API instead of the - # dashboard HTTP API (127.0.0.1:8265/api/v0/nodes). The dynamo image - # installs ray core only (not ray[default]), so the dashboard HTTP server - # starts in --minimal mode with the HTTP server disabled. vLLM's - # add_dp_placement_groups calls list_nodes() which requires that HTTP - # endpoint, causing scale_elastic_ep to fail with "Failed to connect to - # API server". - # - # ray.nodes() uses the GCS gRPC channel directly (no dashboard process - # needed) and returns the same information. Imported lazily so ray is not - # required at module load time (absent in non-elastic-EP deployments). - # - # Format mapping: - # list_nodes() → objects with .node_ip and .node_id - # ray.nodes() → dicts with "NodeManagerAddress" and "NodeID" - import ray - import ray.util.state as _ray_util_state - - class _NodeInfo: - __slots__ = ("node_id", "node_ip") - - def __init__(self, d: dict) -> None: - self.node_ip: str = d["NodeManagerAddress"] - self.node_id: str = d["NodeID"] - - _ray_util_state.list_nodes = lambda **kw: [ - _NodeInfo(n) for n in ray.nodes() if n.get("Alive", False) - ] - await self.engine_client.scale_elastic_ep(new_dp_size)The TODO comment and documentation are valuable—keeping them in the module-level location (lines 59-75) preserves context while eliminating duplication.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@components/src/dynamo/vllm/handlers.py` around lines 585 - 594, Remove the duplicate inline _NodeInfo class and the local patch of _ray_util_state.list_nodes inside the scale_elastic_ep-related code path: delete the class _NodeInfo defined with __slots__ ("node_id","node_ip") and the lambda assignment _ray_util_state.list_nodes = lambda **kw: [ _NodeInfo(n) for n in ray.nodes() if n.get("Alive", False) ]; rely on the existing module-level _NodeInfo and the module-level patch of _ray_util_state.list_nodes (keep the TODO/documentation there) so there is a single canonical implementation.components/src/dynamo/vllm/main.py (2)
62-64: Remove unusedTYPE_CHECKINGblock.This
if TYPE_CHECKING: passblock is dead code—it does nothing at runtime or for static analysis. TheTYPE_CHECKINGimport on line 10 is also unused since there are no type-only imports guarded by this block.If
OmniConfigwas removed and this block was intended to hold a type-only import, either add the appropriate import or remove both the block and theTYPE_CHECKINGimport.♻️ Suggested fix
-from typing import TYPE_CHECKING, Any, Optional +from typing import Any, OptionalAnd remove lines 62-64 entirely.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@components/src/dynamo/vllm/main.py` around lines 62 - 64, Remove the dead TYPE_CHECKING guard and its unused import: delete the "if TYPE_CHECKING: pass" block and the corresponding "TYPE_CHECKING" import at the top of the file (or replace it with the intended type-only import if you actually need a guarded type import such as OmniConfig); ensure no remaining references to TYPE_CHECKING remain in the module.
188-190: Restore type safety forconfigparameter by usingConfig | OmniConfiginstead ofAny.The
configparameter accepts bothConfig(from vllm/args.py) andOmniConfig(from vllm/omni/args.py), as both classes have the required attributes (engine_args,namespace,component,endpoint,model). UsingAnydisables type checking. Define the union type usingTYPE_CHECKINGto avoid import overhead:♻️ Suggested fix
from typing import TYPE_CHECKING, Any, Optional +if TYPE_CHECKING: + from .omni.args import OmniConfig def setup_metrics_collection( - config: Any, generate_endpoint: Endpoint, logger: logging.Logger + config: Config | OmniConfig, generate_endpoint: Endpoint, logger: logging.Logger ) -> None:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@components/src/dynamo/vllm/main.py` around lines 188 - 190, The config parameter in setup_metrics_collection should be typed as a union of vllm.args.Config and vllm.omni.args.OmniConfig instead of Any to restore type safety; wrap imports in a TYPE_CHECKING block (from typing import TYPE_CHECKING) to avoid runtime import overhead, then annotate the function as def setup_metrics_collection(config: "Config | OmniConfig", generate_endpoint: Endpoint, logger: logging.Logger) -> None (or use Config | OmniConfig directly if using from __future__ import annotations), ensuring you reference the Config and OmniConfig symbols from vllm.args and vllm.omni.args respectively so static type checkers can validate access to engine_args, namespace, component, endpoint, and model.container/compliance/resolve_base_image.py (1)
87-93: Code would silently coerce multi-platform strings; however, no current callers pass comma-delimited values to this action.The
split("/")[-1]normalization is problematic for inputs likelinux/amd64,linux/arm64(would coerce toarm64), but examining all workflows shows that compliance-scan'sarchinput only receives single values: either explicitlinux/amd64/linux/arm64or matrix-expanded${{ matrix.arch }}(which expands to individual values). Comma-delimited platforms appear elsewhere (e.g., passed toinit-dynamo-builder), but not directly to compliance-scan.That said, the code lacks the validation layer that other actions use (e.g., render.py for docker-remote-build). To be safe, consider explicitly rejecting invalid formats (e.g., reject
,or enforce exact matches:{amd64, arm64, linux/amd64, linux/arm64}).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@container/compliance/resolve_base_image.py` around lines 87 - 93, The current normalization using arch = args.arch.split("/")[-1] silently coerces multi-platform strings; replace that behavior by validating args.arch exactly against the allowed set {"amd64", "arm64", "linux/amd64", "linux/arm64"} and explicitly reject any input containing commas or other unexpected formats; update the validation block that currently uses args.arch and arch to (1) check for a comma and exit with an error if present, and (2) check args.arch (not the split result) is one of the four exact allowed values, logging a clear error message and sys.exit(1) when it is not.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@components/src/dynamo/common/multimodal/embedding_transfer.py`:
- Line 16: The module imports msgspec (see the import msgspec in
embedding_transfer.py) but the vLLM runtime build installs the Dynamo wheels
with --no-deps so msgspec may not be present; fix by ensuring msgspec is
installed into the runtime image before wheel installation — either install
requirements.common.txt (which pins msgspec) or explicitly pip install msgspec
prior to installing the Dynamo wheels (or add msgspec as a dependency in the
wheel build), so any imports of msgspec in functions/classes under
multimodal/embedding_transfer.py succeed.
In `@container/render.py`:
- Around line 99-103: The CLI currently allows --device choices like cpu/xpu in
parse_args while validate_args enforces vllm only supports "cuda", causing late
failure; update the contract so failures surface earlier by either restricting
parse_args choices when framework is vllm or making the validation
framework-specific and clearer: modify parse_args to conditionally limit
--device choices to ["cuda"] when the selected framework/back-end is "vllm" (or,
if you prefer, change validate_args to raise a framework-specific, descriptive
error mentioning "vllm" and "--device" so callers fail fast), and reference the
validate_args and parse_args functions plus the "--device" flag and "vllm"
backend when implementing the change.
In `@pyproject.toml`:
- Around line 53-54: The comment above the dependency is incorrect — the extra
"vllm" does include vllm-omni==0.16.0, so either remove or correct the comment
in pyproject.toml (referencing the "vllm-omni==0.16.0" line and the "vllm"
extra) to state the accurate behavior, or if the note pertains only to the
container/runtime image, move that note out of pyproject.toml into the container
docs; update the comment text accordingly to avoid the contradiction.
---
Nitpick comments:
In `@components/src/dynamo/vllm/handlers.py`:
- Around line 585-594: Remove the duplicate inline _NodeInfo class and the local
patch of _ray_util_state.list_nodes inside the scale_elastic_ep-related code
path: delete the class _NodeInfo defined with __slots__ ("node_id","node_ip")
and the lambda assignment _ray_util_state.list_nodes = lambda **kw: [
_NodeInfo(n) for n in ray.nodes() if n.get("Alive", False) ]; rely on the
existing module-level _NodeInfo and the module-level patch of
_ray_util_state.list_nodes (keep the TODO/documentation there) so there is a
single canonical implementation.
In `@components/src/dynamo/vllm/main.py`:
- Around line 62-64: Remove the dead TYPE_CHECKING guard and its unused import:
delete the "if TYPE_CHECKING: pass" block and the corresponding "TYPE_CHECKING"
import at the top of the file (or replace it with the intended type-only import
if you actually need a guarded type import such as OmniConfig); ensure no
remaining references to TYPE_CHECKING remain in the module.
- Around line 188-190: The config parameter in setup_metrics_collection should
be typed as a union of vllm.args.Config and vllm.omni.args.OmniConfig instead of
Any to restore type safety; wrap imports in a TYPE_CHECKING block (from typing
import TYPE_CHECKING) to avoid runtime import overhead, then annotate the
function as def setup_metrics_collection(config: "Config | OmniConfig",
generate_endpoint: Endpoint, logger: logging.Logger) -> None (or use Config |
OmniConfig directly if using from __future__ import annotations), ensuring you
reference the Config and OmniConfig symbols from vllm.args and vllm.omni.args
respectively so static type checkers can validate access to engine_args,
namespace, component, endpoint, and model.
In `@container/compliance/resolve_base_image.py`:
- Around line 87-93: The current normalization using arch =
args.arch.split("/")[-1] silently coerces multi-platform strings; replace that
behavior by validating args.arch exactly against the allowed set {"amd64",
"arm64", "linux/amd64", "linux/arm64"} and explicitly reject any input
containing commas or other unexpected formats; update the validation block that
currently uses args.arch and arch to (1) check for a comma and exit with an
error if present, and (2) check args.arch (not the split result) is one of the
four exact allowed values, logging a clear error message and sys.exit(1) when it
is not.
In `@examples/backends/vllm/launch/disagg_multimodal_epd.sh`:
- Around line 124-134: The script hardcodes NIXL side-channel ports
(VLLM_NIXL_SIDE_CHANNEL_PORT=20097/20098/20099) and ZMQ endpoint ports
(tcp://*:20080/20081/20082) which causes conflicts; change these to be
allocated/configurable by using alloc_port from launch_utils.sh or environment
variables (e.g., VLLM_NIXL_SIDE_CHANNEL_PORT_ENCODE/PREFILL/DECODE and
VLLM_ZMQ_PORT_ENCODE/PREFILL/DECODE) before launching each worker and then
reference those variables in the VLLM_NIXL_SIDE_CHANNEL_PORT and
--kv-events-config endpoint strings; ensure ports are exported/initialized early
(call alloc_port where available) so concurrent runs pick different ports.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 8891f7b0-f61a-4038-874c-efa3164a1090
📒 Files selected for processing (26)
.github/actions/compliance-scan/action.ymlcomponents/src/dynamo/common/multimodal/embedding_transfer.pycomponents/src/dynamo/trtllm/publisher.pycomponents/src/dynamo/vllm/handlers.pycomponents/src/dynamo/vllm/main.pycontainer/Dockerfile.templatecontainer/README.mdcontainer/compliance/README.mdcontainer/compliance/resolve_base_image.pycontainer/context.yamlcontainer/deps/README.mdcontainer/deps/requirements.common.txtcontainer/deps/requirements.vllm.txtcontainer/deps/vllm/install_vllm.shcontainer/render.pycontainer/templates/args.Dockerfilecontainer/templates/vllm_framework.Dockerfilecontainer/templates/vllm_runtime.Dockerfilecontainer/templates/wheel_builder.Dockerfiledeploy/sanity_check.pyexamples/backends/vllm/launch/disagg_multimodal_e_pd.shexamples/backends/vllm/launch/disagg_multimodal_epd.shlib/gpu_memory_service/setup.pypyproject.tomltests/fault_tolerance/deploy/container/Dockerfile.local_vllmtests/fault_tolerance/deploy/container/build_from_local_vllm.sh
💤 Files with no reviewable changes (8)
- container/deps/requirements.vllm.txt
- container/deps/README.md
- lib/gpu_memory_service/setup.py
- tests/fault_tolerance/deploy/container/build_from_local_vllm.sh
- tests/fault_tolerance/deploy/container/Dockerfile.local_vllm
- container/templates/vllm_framework.Dockerfile
- container/deps/vllm/install_vllm.sh
- container/deps/requirements.common.txt
994a545 to
505010e
Compare
2863195 to
18bc7b0
Compare
18bc7b0 to
032bc69
Compare
032bc69 to
c97b10a
Compare
c97b10a to
2a6cdbe
Compare
2a6cdbe to
4e26964
Compare
4e26964 to
0522768
Compare
Summary
vllmcontainer path to upstream multi-archvllm/vllm-openaibase images and keep the runtime lean by layering only Dynamo-owned wheels with--no-deps--frameworkoption and broad-copyingexamples/into the vLLM image instead of adding new shared pytest plumbingValidation
python3 container/render.py --framework=vllm --target=runtime --platform=linux/amd64 --cuda-version=12.9 --output-short-filenamedocker build -f container/rendered.Dockerfile -t dynamo:vllm-runtime-local .docker build -f container/Dockerfile.test --build-arg BASE_IMAGE=dynamo:vllm-runtime-local -t dynamo:vllm-test-local .docker run --rm -w /workspace --network host dynamo:vllm-test-local sh -lc 'pytest -v --collect-only -m "pre_merge and vllm and gpu_0"'python3 -m py_compile components/src/dynamo/profiler/tests/unit/test_replay_aic_parity.py lib/bindings/python/tests/replay/replay_utils.py tests/conftest.py tests/deploy/conftest.py tests/fault_tolerance/deploy/conftest.py tests/fault_tolerance/deploy/test_deployment.pygit diff --check