feat(deploy): add new NVIDIA accelerators and update NFD rules#97
feat(deploy): add new NVIDIA accelerators and update NFD rules#97
Conversation
…o h100-sxm Add NFD rules and docs for B300, B200, H200 SXM, H20-3e, and A100 80GB SXM accelerators based on moreh-vllm-specification Appendix B. Rename h100-80gb-hbm3 to h100-sxm for consistency, and remove the redundant generic h100 entry in favor of h100-sxm. PCI device IDs: - B300: 3182 (GB110 [B300 SXM6 AC]) - B200: 2901 (GB100 [B200]) - H200 SXM: 2335 (GH100 [H200 SXM 141GB]) - H20-3e: 232c (GH100 [H20 HBM3e]) - A100 80GB SXM: 20b2 (GA100 [A100 SXM4 80GB]) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR updates the MoAI accelerator labeling surface (NFD rules + docs) to include additional NVIDIA GPUs and align NVIDIA H100 model naming with the external spec referenced in the PR description.
Changes:
- Add NFD NodeFeatureRule entries for NVIDIA B300, B200, H200 SXM, H20-3e, and A100 80GB SXM.
- Rename the NVIDIA H100 SXM label from
h100-80gb-hbm3/h100toh100-sxmand remove the generich100entry from the docs list. - Update the supported devices documentation table to reflect the new/renamed accelerator labels.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
website/docs/reference/supported-devices.mdx |
Updates the public “supported devices” label table to include new NVIDIA accelerators and the h100-sxm rename. |
deploy/helm/moai-inference-framework/templates/nfd/moai-accelerator.yaml |
Updates NFD rules to add new NVIDIA device IDs/models and apply the H100 SXM model rename. |
| - matchFeatures: | ||
| - feature: pci.device | ||
| matchExpressions: | ||
| vendor: { op: In, value: ["10de"] } | ||
| device: { op: In, value: ["2339"] } | ||
| device: { op: In, value: ["20b2"] } |
There was a problem hiding this comment.
By removing the previous accelerator-nvidia-h100 rule (which matched PCI device ID 2339) and not adding 2339 to any other rule, nodes reporting 10de:2339 will no longer get any moai.moreh.io/accelerator.* labels. If 2339 is still a valid H100/Hopper device ID in your target environments, consider mapping it to the correct model (e.g., add it to h100-sxm or create a dedicated rule) and document the source for the ID choice.
| | `moai.moreh.io/accelerator.vendor: nvidia`<br/>`moai.moreh.io/accelerator.model: h100-pcie` | NVIDIA H100 PCIe GPU | | ||
| | `moai.moreh.io/accelerator.vendor: nvidia`<br/>`moai.moreh.io/accelerator.model: h100-80gb-hbm3` | NVIDIA H100 80GB HBM3 GPU | | ||
| | `moai.moreh.io/accelerator.vendor: nvidia`<br/>`moai.moreh.io/accelerator.model: h100-sxm` | NVIDIA H100 SXM GPU | | ||
| | `moai.moreh.io/accelerator.vendor: nvidia`<br/>`moai.moreh.io/accelerator.model: h20-3e` | NVIDIA H20-3e GPU | |
There was a problem hiding this comment.
Since h100 / h100-80gb-hbm3 labels are being replaced with h100-sxm, this page would benefit from an explicit migration note (e.g., a short sentence/admonition near the table) so existing users know which label value to update in nodeSelector/affinity and other configs.
Summary
h100-80gb-hbm3toh100-sxmfor consistency with the spec naming conventionh100entry (duplicate ofh100-sxm)Test plan
helm templatewith NFD enabled🤖 Generated with Claude Code