feat: add XPU container run script by ZhengHongming888 · Pull Request #7669 · ai-dynamo/dynamo

ZhengHongming888 · 2026-03-28T02:58:24Z

Overview:

This PR is to add run_xpu.sh script for launching Dynamo containers with Intel XPU support. The script handles XPU-specific device access, DRI group management, and provides flexible container configuration options.

Details:

generate dockerfile for xpu
python container/render.py --framework vllm --target local-dev --device xpu --output-short-filename
docker build -
docker build --progress=plain --target local-dev -t dynamo:latest-vllm17.1-local-dev6 -f container/rendered.Dockerfile --build-arg USER_UID=$(id -u) --build-arg USER_GID=$(id -g) --build-arg http_proxy=... --build-arg https_proxy=... --build-arg no_proxy=localhost,127.0.0.1,0.0.0.0 .
create container -
container/run_xpu.sh --mount-workspace --image dynamo:latest-vllm17.1-local-dev -it

for local-dev container you still need install inside container for dynamo -
cargo build --features dynamo-llm/block-manager && cd /workspace/lib/bindings/python && maturin develop --uv && cd /workspace && uv pip install --no-deps -e /workspace

With the run_xpu.sh it will simplify the step for creating the xpu container and also aligned with Dynamo templating style.

Tested with run_xpu.sh it works fine.

Thanks.

Summary by CodeRabbit

Release Notes

New Features
- Added Intel XPU container launch utility with support for multiple frameworks (VLLM, TRTLLM, SGLANG)
- Configurable Docker container settings including volumes, ports, and environment variables
- Includes dry-run mode to preview commands before execution

copy-pr-bot · 2026-03-28T02:58:30Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

github-actions · 2026-03-28T02:58:33Z

👋 Hi ZhengHongming888! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

coderabbitai · 2026-03-28T03:04:18Z

Walkthrough

A new executable Bash script is added at container/run_xpu.sh that configures and launches Docker containers with Intel XPU support. The script parses command-line options for framework selection, image/container names, volume/port/environment mappings, workspace mounting, and HF_HOME configuration. It detects XPU device nodes, manages group permissions, and supports dry-run mode for command preview.

Changes

Cohort / File(s)	Summary
XPU Docker Container Launcher `container/run_xpu.sh`	New 374-line Bash script implementing Docker container orchestration with Intel XPU support, including argument parsing, conditional volume/environment mounting, device node detection with fallback warnings, group ID management for custom users, and dry-run execution mode.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat: add XPU container run script' accurately summarizes the main change—adding a new executable script for XPU container management.
Description check	✅ Passed	PR description covers overview, implementation details, and usage examples with clear workflow steps and testing confirmation.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

container/run_xpu.sh (2)

229-239: Minor inconsistency with run.sh mount path.

The /mnt mount on line 232 lacks the trailing slash that run.sh uses (/mnt/:/mnt). While functionally equivalent for directories, consider aligning for consistency across container run scripts.

         VOLUME_MOUNTS+=" -v ${SOURCE_DIR}/..:/workspace "
         VOLUME_MOUNTS+=" -v /tmp:/tmp "
-        VOLUME_MOUNTS+=" -v /mnt:/mnt "
+        VOLUME_MOUNTS+=" -v /mnt/:/mnt "

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@container/run_xpu.sh` around lines 229 - 239, In the MOUNT_WORKSPACE block
where VOLUME_MOUNTS is assembled (conditioned by MOUNT_WORKSPACE and using
SOURCE_DIR), change the mount entry currently added as "-v /mnt:/mnt" to include
the trailing slash so it becomes "-v /mnt/:/mnt" to match the style used in
run.sh; update the VOLUME_MOUNTS concatenation that references /mnt so the
variable and surrounding concatenations (VOLUME_MOUNTS, MOUNT_WORKSPACE,
SOURCE_DIR) remain consistent.

281-288: Consider multi-XPU environments.

The script only checks /dev/dri/renderD128, but systems with multiple Intel XPUs will have additional render nodes (renderD129, renderD130, etc.). While the GID is typically the same for all render devices (the render group), consider:

Verifying /dev/dri exists rather than a specific device
Documenting this assumption in the help text

     # XPU specific: Add DRI device group for GPU access
-    if [ -e /dev/dri/renderD128 ]; then
-        DRI_GROUP=$(stat -c '%g' /dev/dri/renderD128)
+    # Get GID from any render device (all share the same group)
+    RENDER_DEVICE=$(ls /dev/dri/renderD* 2>/dev/null | head -n1)
+    if [ -n "$RENDER_DEVICE" ]; then
+        DRI_GROUP=$(stat -c '%g' "$RENDER_DEVICE")
         GROUP_ADD_STRING="--group-add ${DRI_GROUP}"
     else
         GROUP_ADD_STRING=""
-        echo "Warning: /dev/dri/renderD128 not found. XPU access may not work properly."
+        echo "Warning: No /dev/dri/renderD* devices found. XPU access may not work properly."
     fi

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@container/run_xpu.sh` around lines 281 - 288, The check for a single device
node /dev/dri/renderD128 is fragile in multi-XPU setups; update the logic around
DRI_GROUP and GROUP_ADD_STRING to first verify /dev/dri exists and then derive
the group ID from any renderD* entry (e.g., find the first /dev/dri/renderD*),
and update the help/warning text to document the assumption that all render
devices share the same GID; keep using DRI_GROUP and GROUP_ADD_STRING but
replace the hardcoded path check with a /dev/dri presence check and a probe for
renderD* to obtain the GID.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@container/run_xpu.sh`:
- Around line 352-372: The docker run command invoked by ${RUN_PREFIX} needs
explicit XPU device passthrough when running non-privileged (i.e. when
${PRIVILEGED_STRING} indicates FALSE); update the script to add device flags
(for example --device /dev/dri --device /dev/dri/renderD128 or a new
${DEVICE_STRING} variable) into the docker run invocation path that is used when
PRIVILEGED_STRING is not set, so the container retains access to /dev/dri
devices; alternatively, if non-privileged XPU is unsupported, remove the
--privileged option or update the help text accordingly.

---

Nitpick comments:
In `@container/run_xpu.sh`:
- Around line 229-239: In the MOUNT_WORKSPACE block where VOLUME_MOUNTS is
assembled (conditioned by MOUNT_WORKSPACE and using SOURCE_DIR), change the
mount entry currently added as "-v /mnt:/mnt" to include the trailing slash so
it becomes "-v /mnt/:/mnt" to match the style used in run.sh; update the
VOLUME_MOUNTS concatenation that references /mnt so the variable and surrounding
concatenations (VOLUME_MOUNTS, MOUNT_WORKSPACE, SOURCE_DIR) remain consistent.
- Around line 281-288: The check for a single device node /dev/dri/renderD128 is
fragile in multi-XPU setups; update the logic around DRI_GROUP and
GROUP_ADD_STRING to first verify /dev/dri exists and then derive the group ID
from any renderD* entry (e.g., find the first /dev/dri/renderD*), and update the
help/warning text to document the assumption that all render devices share the
same GID; keep using DRI_GROUP and GROUP_ADD_STRING but replace the hardcoded
path check with a /dev/dri presence check and a probe for renderD* to obtain the
GID.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: bf5a77e0-1ae4-4a79-aeee-ac53125f0bd2

📥 Commits

Reviewing files that changed from the base of the PR and between 7dfbe4f and bdad23e.

📒 Files selected for processing (1)

container/run_xpu.sh

coderabbitai · 2026-03-28T03:04:21Z

+${RUN_PREFIX} docker run \
+    ${INTERACTIVE} \
+    ${RM_STRING} \
+    ${PRIVILEGED_STRING} \
+    ${GROUP_ADD_STRING} \
+    --network "$NETWORK" \
+    --shm-size=10G \
+    --ulimit memlock=-1 \
+    --ulimit stack=67108864 \
+    --ulimit nofile=65536:65536 \
+    ${ENVIRONMENT_VARIABLES} \
+    ${VOLUME_MOUNTS} \
+    ${PORT_MAPPINGS} \
+    -w "$WORKDIR" \
+    --cap-add CAP_SYS_PTRACE \
+    --ipc host \
+    ${USER_STRING} \
+    ${NAME_STRING} \
+    ${ENTRYPOINT_STRING} \
+    ${IMAGE} \
+    "${REMAINING_ARGS[@]}"


⚠️ Potential issue | 🟠 Major

Missing XPU device passthrough for non-privileged mode.

When --privileged FALSE is specified, the container loses access to XPU devices because there's no --device flag to explicitly pass /dev/dri. The --group-add provides permission but not device visibility.

Consider adding device passthrough when not running privileged:

+ # XPU device passthrough (only needed when not privileged) + if [[ ${PRIVILEGED^^} == "FALSE" ]] && [ -d /dev/dri ]; then + DEVICE_STRING="--device=/dev/dri" + else + DEVICE_STRING="" + fi + ${RUN_PREFIX} docker run \ + ${DEVICE_STRING} \ ${INTERACTIVE} \ ${RM_STRING} \

Alternatively, if non-privileged XPU mode isn't intended to be supported, consider either removing the --privileged option or documenting this limitation in the help text.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@container/run_xpu.sh` around lines 352 - 372, The docker run command invoked by ${RUN_PREFIX} needs explicit XPU device passthrough when running non-privileged (i.e. when ${PRIVILEGED_STRING} indicates FALSE); update the script to add device flags (for example --device /dev/dri --device /dev/dri/renderD128 or a new ${DEVICE_STRING} variable) into the docker run invocation path that is used when PRIVILEGED_STRING is not set, so the container retains access to /dev/dri devices; alternatively, if non-privileged XPU is unsupported, remove the --privileged option or update the help text accordingly.

nv-tusharma · 2026-04-01T23:36:38Z

Any reason to not reuse existing logic from run.sh instead of creating a xpu specific script? https://github.com/ai-dynamo/dynamo/blob/main/container/run.sh

ZhengHongming888 · 2026-04-01T23:48:55Z

Any reason to not reuse existing logic from run.sh instead of creating a xpu specific script? https://github.com/ai-dynamo/dynamo/blob/main/container/run.sh

In the beginning i add the modification for xpu in run.sh also. Then later i found many parameters are different than change original run.sh (cuda) much. if later more devices are added this file will be a little complicated. But of course if you think it's necessary to add all together in one file I am ok to change it.. :-)

This is similar for other places like dockfile, example, etc. Now they have the different preference. For templating dockerfile one file is used for multiple device but examples are using the different device folders.. Either one is ok for me..

nv-tusharma

Changes LGTM, please follow up on pre-merge failures

Add run_xpu.sh script for launching Dynamo containers with Intel XPU support. The script handles XPU-specific device access, DRI group management, and provides flexible container configuration options. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Hongming Zheng <hongming.zheng@intel.com>

Signed-off-by: Hongming Zheng <hongming.zheng@intel.com>

Add run_xpu.sh to the ignore filter to resolve pre-merge CI error about uncovered files. This script is a development helper like run.sh and doesn't require CI testing. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Hongming Zheng <hongming.zheng@intel.com>

ZhengHongming888 · 2026-04-09T19:58:39Z

Changes LGTM, please follow up on pre-merge failures

fixed! thanks.

ZhengHongming888 requested a review from a team March 28, 2026 02:58

ZhengHongming888 requested review from a team as code owners March 28, 2026 02:58

pull-request-size Bot added the size/L label Mar 28, 2026

github-actions Bot added feat external-contribution Pull request is from an external contributor container labels Mar 28, 2026

coderabbitai Bot reviewed Mar 28, 2026

View reviewed changes

nv-tusharma reviewed Apr 1, 2026

View reviewed changes

Comment thread container/run_xpu.sh

rmccorm4 added the xpu label Apr 8, 2026

nv-tusharma approved these changes Apr 8, 2026

View reviewed changes

ZhengHongming888 force-pushed the container_run_sh_for_xpu branch from bdad23e to 83d5b3b Compare April 9, 2026 04:30

github-actions Bot added the actions label Apr 9, 2026

ZhengHongming888 and others added 3 commits April 9, 2026 17:21

minor

f2b9dc5

Signed-off-by: Hongming Zheng <hongming.zheng@intel.com>

ZhengHongming888 force-pushed the container_run_sh_for_xpu branch from 507e118 to 50bf408 Compare April 9, 2026 17:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add XPU container run script#7669

feat: add XPU container run script#7669
ZhengHongming888 wants to merge 3 commits intoai-dynamo:mainfrom
ZhengHongming888:container_run_sh_for_xpu

ZhengHongming888 commented Mar 28, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented Mar 28, 2026

Uh oh!

github-actions Bot commented Mar 28, 2026

Uh oh!

coderabbitai Bot commented Mar 28, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Mar 28, 2026

Uh oh!

nv-tusharma commented Apr 1, 2026

Uh oh!

Uh oh!

ZhengHongming888 commented Apr 1, 2026

Uh oh!

nv-tusharma left a comment

Uh oh!

ZhengHongming888 commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ZhengHongming888 commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Summary by CodeRabbit

Release Notes

Uh oh!

copy-pr-bot Bot commented Mar 28, 2026

Uh oh!

github-actions Bot commented Mar 28, 2026

Uh oh!

coderabbitai Bot commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

nv-tusharma commented Apr 1, 2026

Uh oh!

Uh oh!

ZhengHongming888 commented Apr 1, 2026

Uh oh!

nv-tusharma left a comment

Choose a reason for hiding this comment

Uh oh!

ZhengHongming888 commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ZhengHongming888 commented Mar 28, 2026 •

edited

Loading

coderabbitai Bot commented Mar 28, 2026 •

edited

Loading