NVIDIA Models Plugin for Claude Code

A Claude Code plugin that makes 12 NVIDIA NIM-hosted LLMs (Llama 3.x, Kimi K2, DeepSeek V3/R1, Nemotron, Mixtral, Mistral Large, Qwen 2.5, Qwen Coder) available to Claude — and nudges Claude to use them proactively when the user wants a second opinion, an ensemble comparison, or a model better suited to a specific task.

What this packages

Component	What it does
MCP server (`server/server.py`)	Wraps NVIDIA NIM's OpenAI-compatible endpoint as three MCP tools: `list_models`, `nvidia_chat`, `nvidia_compare`.
Skill (`skills/nvidia-delegate/`)	Description-driven trigger that makes Claude consider routing work to NVIDIA models when delegation would help — without the user having to ask.
Slash commands (`commands/`)	Explicit one-key access: `/nvidia-compare`, `/nvidia-list`, `/nvidia-chat`.

The MCP server is the same hardened code shipped at nvidia-models-mcp v0.1.1 — bounded inputs, sanitized errors, 60 s timeout, no live network in tests.

Install

Prerequisites

Claude Code (the CLI / desktop client that supports plugins)
uv on PATH
Python 3.10+
An NVIDIA API key from build.nvidia.com (starts with nvapi-)

1. Export your API key

Windows (PowerShell, persistent):

setx NVIDIA_API_KEY "nvapi-your-key-here"

macOS / Linux (shell profile):

export NVIDIA_API_KEY="nvapi-your-key-here"

Restart Claude Code so it picks up the env var.

2. Add the plugin

Claude Code installs plugins from marketplaces in two steps. Run these inside Claude Code:

/plugin marketplace add NAJEMWEHBE/nvidia-models-plugin
/plugin install nvidia-models@nvidia-models-plugin

The first line registers this repo as a single-plugin marketplace (cloned to your local cache). The second installs the nvidia-models plugin from it.

After install: MCP server is wired automatically, the skill loads at session start, and the three slash commands (/nvidia-compare, /nvidia-list, /nvidia-chat) become available. No restart needed — Claude Code hot-reloads plugins.

How the auto-routing works

Claude reads every skill's description field at session start. The nvidia-delegate skill description lists trigger scenarios ("second opinion", "ensemble", "what would model X say", explicit NVIDIA model names, etc.). When the user's request matches, Claude opens the skill, reads the routing table inside, and calls the appropriate MCP tool. You never have to say "use the MCP" — just describe what you want.

Examples that should auto-trigger:

"Get a second opinion from another LLM on this refactor." "Ask three models to draft a one-paragraph summary and compare them." "Use qwen-coder to write the test, then have deepseek-r1 critique it." "What would Kimi K2 say about this?"

If you want explicit control, use the slash commands instead.

Slash commands

Command	Purpose
`/nvidia-compare <prompt>`	Fan the prompt to 3 default models; synthesize agreements + divergences.
`/nvidia-list`	Show the 12 short keys → full NIM paths.
`/nvidia-chat <short-key> <prompt>`	Single-model call.

Models exposed

Catalog verified live on NIM as of 2026-05-12.

Short key	NIM path	Notes
`llama-4-maverick`	`meta/llama-4-maverick-17b-128e-instruct`	Meta flagship; 128-expert MoE
`llama-3.3-70b`	`meta/llama-3.3-70b-instruct`	Generalist
`llama-3.1-70b`	`meta/llama-3.1-70b-instruct`	Cheap generalist
`deepseek-v4-pro`	`deepseek-ai/deepseek-v4-pro`	Premium reasoning MoE
`deepseek-v4-flash`	`deepseek-ai/deepseek-v4-flash`	Fast reasoning MoE
`kimi-k2.6`	`moonshotai/kimi-k2.6`	Moonshot Kimi
`qwen3-next-80b`	`qwen/qwen3-next-80b-a3b-instruct`	Qwen MoE
`nemotron-super-49b`	`nvidia/llama-3.3-nemotron-super-49b-v1`	NVIDIA Nemotron flagship
`nemotron-mini-4b`	`nvidia/nemotron-mini-4b-instruct`	Tiny NVIDIA
`mixtral-8x22b`	`mistralai/mixtral-8x22b-instruct-v0.1`	Large Mistral MoE
`mixtral-8x7b`	`mistralai/mixtral-8x7b-instruct-v0.1`	Small Mistral MoE

Security

NVIDIA_API_KEY is read from the environment at MCP server startup and never persisted or logged.
All tool arguments are bounded (prompt ≤ 32 KB, system ≤ 8 KB, max_tokens ≤ 8192, temperature ∈ [0, 2]).
Upstream error messages are sanitized — nvapi-…, sk-…, and Authorization/x-api-key substrings are replaced with [REDACTED] before reaching Claude.
The MCP command is uv (PATH lookup). Verify your uv binary (uv --version) is the official astral build — a PATH-shadowed uv could exfiltrate the API key.
Full threat model in the MCP repo's SECURITY.md.

Never paste your API key into a chat session. If you do, rotate it at build.nvidia.com.

Cost

NVIDIA NIM has a free monthly credit. After that, per-token pricing varies by model — check build.nvidia.com for current quotas.

Relationship to `nvidia-models-mcp`

nvidia-models-mcp — the bare MCP server. Use directly if you're on Claude Desktop (drag-drop DXT), Codex CLI, or any non-Claude-Code MCP host.
nvidia-models-plugin (this repo) — the Claude Code experience: bundles the same server plus skill + slash commands. The skill is what makes Claude proactively reach for these models.

Bug reports, model additions, and contributions welcome on either repo.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.claude-plugin		.claude-plugin
commands		commands
server		server
skills/nvidia-delegate		skills/nvidia-delegate
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NVIDIA Models Plugin for Claude Code

What this packages

Install

Prerequisites

1. Export your API key

2. Add the plugin

How the auto-routing works

Slash commands

Models exposed

Security

Cost

Relationship to `nvidia-models-mcp`

License

About

Uh oh!

Releases 4

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NVIDIA Models Plugin for Claude Code

What this packages

Install

Prerequisites

1. Export your API key

2. Add the plugin

How the auto-routing works

Slash commands

Models exposed

Security

Cost

Relationship to nvidia-models-mcp

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Relationship to `nvidia-models-mcp`

Packages