Skip to content

NAJEMWEHBE/nvidia-models-plugin

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NVIDIA Models Plugin for Claude Code

A Claude Code plugin that makes 12 NVIDIA NIM-hosted LLMs (Llama 3.x, Kimi K2, DeepSeek V3/R1, Nemotron, Mixtral, Mistral Large, Qwen 2.5, Qwen Coder) available to Claude — and nudges Claude to use them proactively when the user wants a second opinion, an ensemble comparison, or a model better suited to a specific task.

License: MIT Plugin

What this packages

Component What it does
MCP server (server/server.py) Wraps NVIDIA NIM's OpenAI-compatible endpoint as three MCP tools: list_models, nvidia_chat, nvidia_compare.
Skill (skills/nvidia-delegate/) Description-driven trigger that makes Claude consider routing work to NVIDIA models when delegation would help — without the user having to ask.
Slash commands (commands/) Explicit one-key access: /nvidia-compare, /nvidia-list, /nvidia-chat.

The MCP server is the same hardened code shipped at nvidia-models-mcp v0.1.1 — bounded inputs, sanitized errors, 60 s timeout, no live network in tests.

Install

Prerequisites

  • Claude Code (the CLI / desktop client that supports plugins)
  • uv on PATH
  • Python 3.10+
  • An NVIDIA API key from build.nvidia.com (starts with nvapi-)

1. Export your API key

Windows (PowerShell, persistent):

setx NVIDIA_API_KEY "nvapi-your-key-here"

macOS / Linux (shell profile):

export NVIDIA_API_KEY="nvapi-your-key-here"

Restart Claude Code so it picks up the env var.

2. Add the plugin

Claude Code installs plugins from marketplaces in two steps. Run these inside Claude Code:

/plugin marketplace add NAJEMWEHBE/nvidia-models-plugin
/plugin install nvidia-models@nvidia-models-plugin

The first line registers this repo as a single-plugin marketplace (cloned to your local cache). The second installs the nvidia-models plugin from it.

After install: MCP server is wired automatically, the skill loads at session start, and the three slash commands (/nvidia-compare, /nvidia-list, /nvidia-chat) become available. No restart needed — Claude Code hot-reloads plugins.

How the auto-routing works

Claude reads every skill's description field at session start. The nvidia-delegate skill description lists trigger scenarios ("second opinion", "ensemble", "what would model X say", explicit NVIDIA model names, etc.). When the user's request matches, Claude opens the skill, reads the routing table inside, and calls the appropriate MCP tool. You never have to say "use the MCP" — just describe what you want.

Examples that should auto-trigger:

"Get a second opinion from another LLM on this refactor." "Ask three models to draft a one-paragraph summary and compare them." "Use qwen-coder to write the test, then have deepseek-r1 critique it." "What would Kimi K2 say about this?"

If you want explicit control, use the slash commands instead.

Slash commands

Command Purpose
/nvidia-compare <prompt> Fan the prompt to 3 default models; synthesize agreements + divergences.
/nvidia-list Show the 12 short keys → full NIM paths.
/nvidia-chat <short-key> <prompt> Single-model call.

Models exposed

Catalog verified live on NIM as of 2026-05-12.

Short key NIM path Notes
llama-4-maverick meta/llama-4-maverick-17b-128e-instruct Meta flagship; 128-expert MoE
llama-3.3-70b meta/llama-3.3-70b-instruct Generalist
llama-3.1-70b meta/llama-3.1-70b-instruct Cheap generalist
deepseek-v4-pro deepseek-ai/deepseek-v4-pro Premium reasoning MoE
deepseek-v4-flash deepseek-ai/deepseek-v4-flash Fast reasoning MoE
kimi-k2.6 moonshotai/kimi-k2.6 Moonshot Kimi
qwen3-next-80b qwen/qwen3-next-80b-a3b-instruct Qwen MoE
nemotron-super-49b nvidia/llama-3.3-nemotron-super-49b-v1 NVIDIA Nemotron flagship
nemotron-mini-4b nvidia/nemotron-mini-4b-instruct Tiny NVIDIA
mixtral-8x22b mistralai/mixtral-8x22b-instruct-v0.1 Large Mistral MoE
mixtral-8x7b mistralai/mixtral-8x7b-instruct-v0.1 Small Mistral MoE

Security

  • NVIDIA_API_KEY is read from the environment at MCP server startup and never persisted or logged.
  • All tool arguments are bounded (prompt ≤ 32 KB, system ≤ 8 KB, max_tokens ≤ 8192, temperature ∈ [0, 2]).
  • Upstream error messages are sanitized — nvapi-…, sk-…, and Authorization/x-api-key substrings are replaced with [REDACTED] before reaching Claude.
  • The MCP command is uv (PATH lookup). Verify your uv binary (uv --version) is the official astral build — a PATH-shadowed uv could exfiltrate the API key.
  • Full threat model in the MCP repo's SECURITY.md.

Never paste your API key into a chat session. If you do, rotate it at build.nvidia.com.

Cost

NVIDIA NIM has a free monthly credit. After that, per-token pricing varies by model — check build.nvidia.com for current quotas.

Relationship to nvidia-models-mcp

  • nvidia-models-mcp — the bare MCP server. Use directly if you're on Claude Desktop (drag-drop DXT), Codex CLI, or any non-Claude-Code MCP host.
  • nvidia-models-plugin (this repo) — the Claude Code experience: bundles the same server plus skill + slash commands. The skill is what makes Claude proactively reach for these models.

Bug reports, model additions, and contributions welcome on either repo.

License

MIT

About

Claude Code plugin that bundles the nvidia-models MCP server, an auto-trigger skill, and /nvidia-* slash commands so Claude proactively considers routing to 12 NVIDIA NIM-hosted LLMs.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages