Context window incorrectly capped at 8192 tokens when using Ollama (token count exceeded 8192)

**Description**

I’m using ART with a local **Ollama** server as the inference backend (for both the agent model and judge models). I’ve configured my Ollama model with a context window well above 8192 tokens (e.g. `ctx: 16384`) and adjusted `num_predict` accordingly.

However, in many runs I still get errors like:

> `token count exceeded 8192`

This happens even though:

* The Ollama model is configured with `ctx > 8192` (for example 16384).
* I’m explicitly passing the correct base URL pointing to my local Ollama server for:

  * The **agent model** (used by `init_chat_model`)
  * The **judge model** (RULER)
  * Any other inference calls

This suggests there is a **hardcoded or implicit max token limit of 8192** somewhere in ART/RULER, or in how token counts are computed, independent of the actual model’s context window.

**What I expect**

* ART should respect the context window of the underlying model or the configured `ctx` when running through Ollama.
* If a hard limit exists (e.g. 8192), it should be:

  * Documented **and** configurable; or
  * Derived from the model’s metadata, not hard-coded.

**What actually happens**

* Even with a model and server configured to support > 8k context, I regularly get `token count exceeded 8192` errors.
* This happens when:

  * Running rollouts with a LangGraph agent via `init_chat_model`
  * Running RULER scoring with the same Ollama backend

**Environment**

* Backend: Local Ollama server
* Model: Qwen / other Ollama-hosted model (with `ctx` > 8192)
* ART: latest version (as of date of issue)
* Using ART’s LangGraph integration (`init_chat_model`) and RULER scoring

**Questions / Requests**

* Is there an internal default limit of 8192 tokens that’s applied regardless of the model’s context?
* Can you expose this limit via configuration, or derive it from the model / backend rather than hardcoding?
* Any guidance on how to set ART/RULER up so that it fully respects Ollama’s larger `ctx`?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Context window incorrectly capped at 8192 tokens when using Ollama (token count exceeded 8192) #473

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Context window incorrectly capped at 8192 tokens when using Ollama (token count exceeded 8192) #473

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions