Skip to content

feat(client): RENDERERS_MAX_PROMPT_LEN env override for pre-flight cap#62

Closed
snimu wants to merge 1 commit into
mainfrom
sebastian/preflight-max-prompt-len-env-override
Closed

feat(client): RENDERERS_MAX_PROMPT_LEN env override for pre-flight cap#62
snimu wants to merge 1 commit into
mainfrom
sebastian/preflight-max-prompt-len-env-override

Conversation

@snimu
Copy link
Copy Markdown
Contributor

@snimu snimu commented May 25, 2026

Adds an env-var escape hatch for the pre-flight overflow check in _resolve_max_prompt_len. When RENDERERS_MAX_PROMPT_LEN is set to a positive integer, that value is returned directly and /v1/models is not queried.

Motivation: routers/gateways whose /v1/models handler is broken (observed with vllm-router v0.1.22 under --intra-node-data-parallel-size

  1. silently disable the pre-flight via the cached-None path,
    which lets overlong prompts reach the engine and crash the orchestrator with a raw ValueError. Operators who know the real cap can now set the env var to restore pre-flight without touching the broken endpoint.

Invalid values (non-integer, <= 0) are logged and ignored, falling back to the existing auto-discovery path.


Note

Low Risk
Optional env override on client-side pre-flight only; invalid values are ignored with fallback to existing discovery.

Overview
Adds RENDERERS_MAX_PROMPT_LEN so operators can pin the client pre-flight context cap without calling GET /v1/models. When set to a positive integer, _resolve_max_prompt_len returns that value immediately and skips engine model-card discovery—useful when /v1/models is broken but the real max_model_len is known.

Invalid values (non-integer or ≤ 0) are logged and ignored; behavior falls back to the existing auto-discovery and cache path. Tests cover override winning over the model card, skipping /v1/models, and invalid env falling back to discovery.

Reviewed by Cursor Bugbot for commit 5f653df. Bugbot is set up for automated code reviews on this repo. Configure here.

Note

Add RENDERERS_MAX_PROMPT_LEN env var override for pre-flight prompt length cap

  • Adds _max_prompt_len_from_env in renderers/client.py that reads RENDERERS_MAX_PROMPT_LEN and parses it as a positive integer, logging a warning and returning None for invalid values.
  • Updates _resolve_max_prompt_len to return the env var value immediately when set, skipping the /v1/models HTTP request and the in-memory cache lookup.
  • When the env var is unset or invalid, behavior is unchanged: check the cache, then query /v1/models.

Macroscope summarized 5f653df.

Adds an env-var escape hatch for the pre-flight overflow check in
`_resolve_max_prompt_len`. When `RENDERERS_MAX_PROMPT_LEN` is set to a
positive integer, that value is returned directly and `/v1/models` is
not queried.

Motivation: routers/gateways whose `/v1/models` handler is broken
(observed with vllm-router v0.1.22 under `--intra-node-data-parallel-size`
> 1) silently disable the pre-flight via the cached-`None` path,
which lets overlong prompts reach the engine and crash the orchestrator
with a raw `ValueError`. Operators who know the real cap can now set
the env var to restore pre-flight without touching the broken endpoint.

Invalid values (non-integer, <= 0) are logged and ignored, falling
back to the existing auto-discovery path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 5f653df. Configure here.

Comment thread renderers/client.py
"""
override = _max_prompt_len_from_env()
if override is not None:
return override
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invalid env var logs warning on every call

Medium Severity

_max_prompt_len_from_env() is called on every invocation of _resolve_max_prompt_len, which runs on every generate() call. When the env var is set to an invalid value (e.g. a typo like "4096O"), the warning is logged on every single call — potentially thousands of times per second — even though the auto-discovery fallback result is properly cached. The auto-discovery path deliberately caches failures to avoid "retry on every call," but the env var parse has no such caching, creating unbounded log spam for a simple operator typo.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 5f653df. Configure here.

@macroscopeapp
Copy link
Copy Markdown

macroscopeapp Bot commented May 25, 2026

Approvability

Verdict: Approved

Small, self-contained feature adding an optional env var override for max prompt length. Existing behavior unchanged when env var is unset. The unresolved comment about log spam on invalid env values is a minor polish issue, not a blocking concern.

You can customize Macroscope's approvability policy. Learn more.

@snimu snimu closed this May 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant