RULER scoring + training tightly coupled to Litellm/OpenAI, cannot cleanly use ChatOllama/ChatNVIDIA as judge/inference models

**Description**

In my setup, I want to:

* Use a **local Ollama server** (and potentially NVIDIA’s API in the future) as:

  * The main agent model (for rollouts).
  * The judge model (for RULER scoring).

However, the current ART stack makes this very difficult because:

* **RULER scoring** (`ruler_score_group` and related helpers) rely on Litellm in a way that expects OpenAI-style models.
* `init_chat_model` also wraps everything in a `ChatOpenAI` instance (see separate issue).
* This means I **cannot simply pass `ChatOllama` or `ChatNVIDIA`** (LangChain chat models) as the inference/judge model for training.

Practically:

* If I try to step away from OpenAI and use:

  * Local Ollama for inference
  * Non-OpenAI providers as judges
* I run into incompatibilities where:

  * RULER expects Litellm’s OpenAI-style model identifiers and behavior.
  * ART’s helpers are “too bound” to OpenAI semantics.

**What I’d like**

* A more **provider-agnostic** design for:

  * RULER scoring
  * Training
  * `init_chat_model`
* The ability to cleanly use:

  * `ChatOllama` (LangChain)
  * `ChatNVIDIA`
  * or other LangChain `BaseChatModel` implementations
* Without having to hack around Litellm / OpenAI assumptions.

**Why this matters**

* ART is otherwise a great framework for agent RL.
* Many users want to move to:

  * Local models (Ollama)
  * Different clouds (NVIDIA, etc.)
* Tight coupling to OpenAI via Litellm in the RULER path makes this significantly harder.

**Request**

* Please consider:

  * Abstracting RULER to accept any LangChain-compatible `ChatModel` for structured scoring.
  * Or providing a documented way to plug in non-OpenAI judgment models (e.g. a “judge_fn” that uses arbitrary models).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RULER scoring + training tightly coupled to Litellm/OpenAI, cannot cleanly use ChatOllama/ChatNVIDIA as judge/inference models #475

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

RULER scoring + training tightly coupled to Litellm/OpenAI, cannot cleanly use ChatOllama/ChatNVIDIA as judge/inference models #475

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions