Skip to content

fix: add NVIDIA NIM provider profile for input_type embedding field#268

Open
vicjayjay wants to merge 4 commits intoCortexReach:masterfrom
vicjayjay:fix/nvidia-nim-provider-profile
Open

fix: add NVIDIA NIM provider profile for input_type embedding field#268
vicjayjay wants to merge 4 commits intoCortexReach:masterfrom
vicjayjay:fix/nvidia-nim-provider-profile

Conversation

@vicjayjay
Copy link

@vicjayjay vicjayjay commented Mar 19, 2026

Summary

Supersedes #219. Implements the fix properly using the provider profile architecture from #216.

  • Adds "nvidia" to EmbeddingProviderProfile type
  • Detects NVIDIA NIM via *.nvidia.com base URLs, nvidia/* model prefixes, and nv-embed* model names
  • Sets taskField: "input_type" with value mapping (retrieval.queryquery, retrieval.passagepassage)
  • Supports encoding_format: "float" and forwards dimensions for NVIDIA dynamic models

Changes

  • src/embedder.ts: Added the NVIDIA provider profile, kept dimensions forwarding for dynamic models, and simplified the NVIDIA baseURL detection regex to /\.nvidia\.com/i
  • test/nvidia-nim-provider-profile.test.mjs: 6 automated tests covering all maintainer requirements:
    • NVIDIA requests send input_type (not task)
    • retrieval.query/retrieval.passage are mapped correctly
    • NVIDIA is detected from an .nvidia.com baseURL
    • Non-NVIDIA providers (Jina) still send task
    • Generic providers send neither task nor input_type

Test plan

  • All 6 NVIDIA profile tests pass (node --test test/nvidia-nim-provider-profile.test.mjs)
  • Existing embedder tests pass (node test/embedder-error-hints.test.mjs)
  • CI pipeline

🤖 Generated with Claude Code

NVIDIA NIM rejects the `task` field and requires `input_type` instead.
This adds a proper "nvidia" provider profile following the architecture
introduced in CortexReach#216, rather than hardcoding URL checks in buildPayload.

Detection: matches *.nvidia.com base URLs, nvidia/* model prefixes,
and nv-embed* model names.

Capabilities: sends input_type instead of task, maps retrieval.query →
query and retrieval.passage → passage, supports encoding_format: float.

Includes 5 automated tests covering:
- NVIDIA sends input_type (not task)
- retrieval.passage → passage value mapping
- nvidia/ model prefix detection
- Jina still sends task field
- Generic providers send neither

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ad8c5ddd19

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

src/embedder.ts Outdated
"query": "query",
"passage": "passage",
},
dimensionsField: null,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep dimensions forwarding for dynamic NVIDIA models

For the new nvidia profile, setting dimensionsField to null means buildPayload() never forwards embedding.dimensions (src/embedder.ts:646-647), even though the constructor still treats that config as the expected vector size (src/embedder.ts:470). NVIDIA's OpenAI-compatible docs explicitly use a dimensions parameter for dynamic models such as nvidia/llama-3.2-nv-embedqa-1b-v2; with a config like dimensions: 384, this change now requests the model's default 2048-d output and immediately fails with Embedding dimension mismatch, or creates the wrong-sized table.

Useful? React with 👍 / 👎.

NVIDIA NIM's OpenAI-compatible API supports a `dimensions` parameter
for dynamic models like nvidia/llama-3.2-nv-embedqa-1b-v2. Setting
dimensionsField to null prevented buildPayload() from forwarding the
configured dimensions, causing dimension mismatch errors.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Collaborator

@rwmjhb rwmjhb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work — the provider profile approach following #216 is clean and extensible. Two small things:

Suggestion: update PR description

The summary still says "disables dimensionsField", but the second commit intentionally changed it to "dimensions" for dynamic models. Worth syncing the description so it doesn't confuse future readers.

Nits:

  • Detection regex /\.nvidia\.com|integrate\.api\.nvidia\.com/i — the second alternative is already matched by the first, could simplify to /\.nvidia\.com/i
  • Consider adding a test that detects NVIDIA via a .nvidia.com baseURL (currently all tests use model name patterns)

Added a test case to detect NVIDIA from a .nvidia.com baseURL and modified import to include formatEmbeddingProviderError.
Simplifies the NVIDIA provider baseURL detection pattern now that .nvidia.com already covers integrate.api.nvidia.com.
Copy link
Author

@rwmjhb synced this up.

I updated the PR description so it now reflects that NVIDIA dynamic models still forward dimensions, simplified the detection regex to /\.nvidia\.com/i, and added a .nvidia.com baseURL detection test.

The follow-up commits are:

  • 7871fa0 Add NVIDIA detection test and update imports
  • 1507994 Refactor NVIDIA compatibility check regex

I also re-checked the earlier dimensions regression and didn't find another leftover behavior bug beyond the description drift and the two nits you called out.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 15079944d9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

return "voyage-compatible";
}

if (/\.nvidia\.com/i.test(base) || /^nvidia\//i.test(model) || /^nv-embed/i.test(model)) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Restrict the NVIDIA profile to models that accept input_type

This detection is broader than the capability it enables. Any *.nvidia.com endpoint or nvidia/*/nv-embed* model now gets the nvidia profile, and buildPayload() will therefore add input_type whenever embedding.taskQuery or taskPassage is set. That also sweeps in NVIDIA-hosted embeddings such as BAAI/bge-m3, snowflake/arctic-embed-l, nvidia/nv-embed-v1, and nvidia/nv-embedcode-7b-v1, whose model docs describe plain text (or task-specific instructions) rather than the retriever-style query/passage contract. Those configs previously behaved as generic OpenAI-compatible embeddings, so this heuristic can turn valid requests into 400s or wrong embeddings for non-retriever models.

Useful? React with 👍 / 👎.

Copy link
Author

Follow-up commits are pushed and the PR is ready for re-check. There are currently 3 workflow runs awaiting maintainer approval on this PR before they can execute: CI, Claude Code Review, and Claude Code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants