Skip to content

feat(deploy): add new NVIDIA accelerators and update NFD rules#97

Merged
hhk7734 merged 1 commit intomainfrom
feat/add-new-nvidia-accelerators
Mar 25, 2026
Merged

feat(deploy): add new NVIDIA accelerators and update NFD rules#97
hhk7734 merged 1 commit intomainfrom
feat/add-new-nvidia-accelerators

Conversation

@hhk7734
Copy link
Copy Markdown
Member

@hhk7734 hhk7734 commented Mar 25, 2026

Summary

  • Add NFD rules and docs for 5 new NVIDIA accelerators (B300, B200, H200 SXM, H20-3e, A100 80GB SXM) based on moreh-vllm-specification Appendix B
  • Rename h100-80gb-hbm3 to h100-sxm for consistency with the spec naming convention
  • Remove the redundant generic h100 entry (duplicate of h100-sxm)

Test plan

  • Verify Helm template renders correctly: helm template with NFD enabled
  • Confirm PCI device IDs match the pci-ids.ucw.cz database
  • Verify the docs page renders correctly in the website

🤖 Generated with Claude Code

…o h100-sxm

Add NFD rules and docs for B300, B200, H200 SXM, H20-3e, and A100 80GB
SXM accelerators based on moreh-vllm-specification Appendix B. Rename
h100-80gb-hbm3 to h100-sxm for consistency, and remove the redundant
generic h100 entry in favor of h100-sxm.

PCI device IDs:
- B300: 3182 (GB110 [B300 SXM6 AC])
- B200: 2901 (GB100 [B200])
- H200 SXM: 2335 (GH100 [H200 SXM 141GB])
- H20-3e: 232c (GH100 [H20 HBM3e])
- A100 80GB SXM: 20b2 (GA100 [A100 SXM4 80GB])

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@hhk7734 hhk7734 requested a review from a team as a code owner March 25, 2026 16:17
Copilot AI review requested due to automatic review settings March 25, 2026 16:17
@hhk7734 hhk7734 merged commit d17fc91 into main Mar 25, 2026
8 checks passed
@hhk7734 hhk7734 deleted the feat/add-new-nvidia-accelerators branch March 25, 2026 16:22
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the MoAI accelerator labeling surface (NFD rules + docs) to include additional NVIDIA GPUs and align NVIDIA H100 model naming with the external spec referenced in the PR description.

Changes:

  • Add NFD NodeFeatureRule entries for NVIDIA B300, B200, H200 SXM, H20-3e, and A100 80GB SXM.
  • Rename the NVIDIA H100 SXM label from h100-80gb-hbm3/h100 to h100-sxm and remove the generic h100 entry from the docs list.
  • Update the supported devices documentation table to reflect the new/renamed accelerator labels.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
website/docs/reference/supported-devices.mdx Updates the public “supported devices” label table to include new NVIDIA accelerators and the h100-sxm rename.
deploy/helm/moai-inference-framework/templates/nfd/moai-accelerator.yaml Updates NFD rules to add new NVIDIA device IDs/models and apply the H100 SXM model rename.

Comment on lines 166 to +170
- matchFeatures:
- feature: pci.device
matchExpressions:
vendor: { op: In, value: ["10de"] }
device: { op: In, value: ["2339"] }
device: { op: In, value: ["20b2"] }
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By removing the previous accelerator-nvidia-h100 rule (which matched PCI device ID 2339) and not adding 2339 to any other rule, nodes reporting 10de:2339 will no longer get any moai.moreh.io/accelerator.* labels. If 2339 is still a valid H100/Hopper device ID in your target environments, consider mapping it to the correct model (e.g., add it to h100-sxm or create a dedicated rule) and document the source for the ID choice.

Copilot uses AI. Check for mistakes.
Comment on lines 29 to +31
| `moai.moreh.io/accelerator.vendor: nvidia`<br/>`moai.moreh.io/accelerator.model: h100-pcie` | NVIDIA H100 PCIe GPU |
| `moai.moreh.io/accelerator.vendor: nvidia`<br/>`moai.moreh.io/accelerator.model: h100-80gb-hbm3` | NVIDIA H100 80GB HBM3 GPU |
| `moai.moreh.io/accelerator.vendor: nvidia`<br/>`moai.moreh.io/accelerator.model: h100-sxm` | NVIDIA H100 SXM GPU |
| `moai.moreh.io/accelerator.vendor: nvidia`<br/>`moai.moreh.io/accelerator.model: h20-3e` | NVIDIA H20-3e GPU |
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since h100 / h100-80gb-hbm3 labels are being replaced with h100-sxm, this page would benefit from an explicit migration note (e.g., a short sentence/admonition near the table) so existing users know which label value to update in nodeSelector/affinity and other configs.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants