feat: add P/D disaggregated examples for XPU+CUDA with host buffer and RDMA by pallavijaini0525 · Pull Request #7673 · ai-dynamo/dynamo

pallavijaini0525 · 2026-03-30T00:08:46Z

Overview:

Adds two new Kubernetes deployment examples for mixed-device disaggregated prefill/decode (P/D) serving: Intel XPU as the prefill worker paired with an NVIDIA CUDA GPU as the decode worker. Two variants are provided - one using a CPU host buffer over TCP , and one using direct GPU-to-GPU KV cache transfer over an InfiniBand/RoCE RDMA fabric via NIXL.

Details:

disagg_xpu_cuda.yaml: Mixed P/D disaggregated deployment with Intel XPU prefill and NVIDIA CUDA decode. KV cache is staged through a CPU host buffer and transferred over TCP (UCX_TLS: tcp). KV cache events are published over ZMQ for cache-aware routing (DYN_ROUTER_MODE=kv). XPU device is allocated via Kubernetes DRA (ResourceClaimTemplate with gpu.intel.com).

disagg_xpu_cuda_rdma.yaml: Same mixed P/D topology but with high-performance RDMA KV transfer. Both the prefill (XPU, kv_buffer_device: xpu, ze_copy) and decode (CUDA, kv_buffer_device: cuda, cuda_copy) workers claim an RDMA NIC via ResourceClaimTemplate (rdma-dranet). UCX uses ib,rc transports for GPU-to-GPU KV movement without staging through CPU memory.

README.md: Updated Section 6 (Intel XPU deployments) with architecture descriptions for both new templates.

Where should the reviewer start?

examples/backends/vllm/deploy/disagg_xpu_cuda_rdma.yaml — the more complex new template; verify the dual ResourceClaimTemplate setup (xpu-template + rdma-net-template) and the UCX/NIXL env vars on each worker

examples/backends/vllm/deploy/disagg_xpu_cuda.yaml — simpler variant

examples/backends/vllm/deploy/README.md — verify the new architecture descriptions, prerequisites, and links are accurate

Summary by CodeRabbit

Documentation
- Expanded Intel XPU deployment guide with additional configuration templates and verification steps for mixed workload deployments.
New Features
- Added two new Kubernetes deployment configurations supporting mixed XPU and CUDA environments with TCP and RDMA network variants.

…r and RDMA Signed-off-by: pallavi jaini <pallavi.jaini@intel.com>

copy-pr-bot · 2026-03-30T00:08:50Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

github-actions · 2026-03-30T00:08:55Z

👋 Hi pallavijaini0525! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

coderabbitai · 2026-03-30T00:18:57Z

Walkthrough

This pull request introduces two new Kubernetes deployment manifests for mixed XPU and CUDA disaggregated inference configurations and updates the deployment documentation to describe these new architecture options and their usage.

Changes

Cohort / File(s)	Summary
Documentation Update `examples/backends/vllm/deploy/README.md`	Expanded Intel XPU deployment documentation to include new mixed XPU+CUDA disaggregated templates (`disagg_xpu_cuda.yaml`, `disagg_xpu_cuda_rdma.yaml`) alongside existing DRA-based templates. Added descriptions of four architecture variants and new run/verification subsections for both TCP and RDMA-based mixed deployments.
Mixed XPU + CUDA Disaggregated Deployments `examples/backends/vllm/deploy/disagg_xpu_cuda.yaml`, `examples/backends/vllm/deploy/disagg_xpu_cuda_rdma.yaml`	New Kubernetes manifests defining multi-component `DynamoGraphDeployment` configurations. The TCP variant uses ZMQ KV event publishing and KV-aware routing for XPU prefill + CUDA decode. The RDMA variant adds RDMA network resource claims, elevated Linux capabilities, and RDMA/Nixl fabric configuration for the same XPU/CUDA disaggregated split.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description check	✅ Passed	The PR description comprehensively covers all template sections: Overview clearly states the purpose, Details explains both new files and README updates with technical specifics, and Where to Start provides focused reviewer guidance on file review order.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Title check	✅ Passed	The title accurately and specifically describes the main change: adding P/D disaggregated deployment examples for XPU+CUDA with host buffer and RDMA configurations.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/backends/vllm/deploy/disagg_xpu_cuda_rdma.yaml`:
- Around line 72-80: The manifest's securityContext.capabilities currently adds
IPC_LOCK, SYS_PTRACE, NET_ADMIN, NET_RAW, SYS_ADMIN, and SYS_RESOURCE which
violate Kubernetes Pod Security standards; either prune this list to only the
capabilities actually required by your RDMA/NIXL workload (e.g., determine
minimal set and remove NET_ADMIN, SYS_ADMIN, SYS_PTRACE, SYS_RESOURCE, IPC_LOCK,
NET_RAW unless proven necessary) in the deployment YAML, or add a clear Pod
Security Policy note in the example README explaining that the template requires
a relaxed namespace pod-security label (e.g., pod-security.kubernetes.io/enforce
set to baseline/unrestricted or a cluster exception) and document the exact
capabilities and why each is needed; update the securityContext.capabilities
block and README accordingly (search for securityContext and capabilities in the
YAML to locate both occurrence sites).

In `@examples/backends/vllm/deploy/disagg_xpu_cuda.yaml`:
- Around line 82-85: The PrefillWorker block places requests at the service root
instead of under resources, so the ephemeral-storage request is ignored; move
the existing requests: ephemeral-storage: "2Gi" into a resources: requests:
subtree for PrefillWorker (matching how DecodeWorker and the PrefillWorker in
disagg_xpu_cuda_rdma.yaml are defined), ensuring indentation and nesting are
correct so resources.requests.ephemeral-storage is applied.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 01339973-c570-4bbc-a1b3-f453233274ed

📥 Commits

Reviewing files that changed from the base of the PR and between 98d0ce9 and cc3a017.

📒 Files selected for processing (3)

examples/backends/vllm/deploy/README.md
examples/backends/vllm/deploy/disagg_xpu_cuda.yaml
examples/backends/vllm/deploy/disagg_xpu_cuda_rdma.yaml

Signed-off-by: pallavi jaini <pallavi.jaini@intel.com>

Added the PD disaggregation examples for XPU and CUDA using hostbuffe…

cc3a017

…r and RDMA Signed-off-by: pallavi jaini <pallavi.jaini@intel.com>

pallavijaini0525 requested a review from a team as a code owner March 30, 2026 00:08

pull-request-size Bot added the size/L label Mar 30, 2026

github-actions Bot added external-contribution Pull request is from an external contributor documentation Improvements or additions to documentation backend::vllm Relates to the vllm backend labels Mar 30, 2026

coderabbitai Bot reviewed Mar 30, 2026

View reviewed changes

Comment thread examples/backends/vllm/deploy/disagg_xpu_cuda_rdma.yaml Outdated

Comment thread examples/backends/vllm/deploy/disagg_xpu_cuda.yaml Outdated

pallavijaini0525 added 2 commits March 29, 2026 17:58

Fixed the Lint issues and updated the README.md file

c8b61ff

Signed-off-by: pallavi jaini <pallavi.jaini@intel.com>

Fixed the prefill yaml

7b22302

Signed-off-by: pallavi jaini <pallavi.jaini@intel.com>

pallavijaini0525 changed the title ~~Added the PD disaggregation examples for XPU and CUDA using hostbuffe…~~ feat: add P/D disaggregated examples for XPU+CUDA with host buffer and RDMA Mar 30, 2026

github-actions Bot added the feat label Mar 30, 2026

pallavijaini0525 added 3 commits March 30, 2026 09:37

Merge branch 'main' into xpu_cuda_examples

196ed7e

Merge branch 'main' into xpu_cuda_examples

ef21955

Merge branch 'main' into xpu_cuda_examples

acc0f19

Merge branch 'main' into xpu_cuda_examples

9ccd5b0

github-actions Bot added the xpu label Apr 8, 2026

tmonty12 approved these changes Apr 9, 2026

View reviewed changes

tmonty12 enabled auto-merge (squash) April 9, 2026 22:18

This was referenced Apr 21, 2026

[Roadmap] Intel XPU to NV Dynamo ftian1/dynamo#1

Open

[Roadmap]: Intel Roadmap to NV dynamo ftian1/dynamo#2

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add P/D disaggregated examples for XPU+CUDA with host buffer and RDMA#7673

feat: add P/D disaggregated examples for XPU+CUDA with host buffer and RDMA#7673
pallavijaini0525 wants to merge 7 commits intoai-dynamo:mainfrom
pallavijaini0525:xpu_cuda_examples

pallavijaini0525 commented Mar 30, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

copy-pr-bot Bot commented Mar 30, 2026

Uh oh!

github-actions Bot commented Mar 30, 2026

Uh oh!

coderabbitai Bot commented Mar 30, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

pallavijaini0525 commented Mar 30, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Summary by CodeRabbit

Uh oh!

copy-pr-bot Bot commented Mar 30, 2026

Uh oh!

github-actions Bot commented Mar 30, 2026

Uh oh!

coderabbitai Bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pallavijaini0525 commented Mar 30, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 30, 2026 •

edited

Loading