feat(client): RENDERERS_MAX_PROMPT_LEN env override for pre-flight cap#62
feat(client): RENDERERS_MAX_PROMPT_LEN env override for pre-flight cap#62snimu wants to merge 1 commit into
Conversation
Adds an env-var escape hatch for the pre-flight overflow check in `_resolve_max_prompt_len`. When `RENDERERS_MAX_PROMPT_LEN` is set to a positive integer, that value is returned directly and `/v1/models` is not queried. Motivation: routers/gateways whose `/v1/models` handler is broken (observed with vllm-router v0.1.22 under `--intra-node-data-parallel-size` > 1) silently disable the pre-flight via the cached-`None` path, which lets overlong prompts reach the engine and crash the orchestrator with a raw `ValueError`. Operators who know the real cap can now set the env var to restore pre-flight without touching the broken endpoint. Invalid values (non-integer, <= 0) are logged and ignored, falling back to the existing auto-discovery path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 5f653df. Configure here.
| """ | ||
| override = _max_prompt_len_from_env() | ||
| if override is not None: | ||
| return override |
There was a problem hiding this comment.
Invalid env var logs warning on every call
Medium Severity
_max_prompt_len_from_env() is called on every invocation of _resolve_max_prompt_len, which runs on every generate() call. When the env var is set to an invalid value (e.g. a typo like "4096O"), the warning is logged on every single call — potentially thousands of times per second — even though the auto-discovery fallback result is properly cached. The auto-discovery path deliberately caches failures to avoid "retry on every call," but the env var parse has no such caching, creating unbounded log spam for a simple operator typo.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 5f653df. Configure here.
ApprovabilityVerdict: Approved Small, self-contained feature adding an optional env var override for max prompt length. Existing behavior unchanged when env var is unset. The unresolved comment about log spam on invalid env values is a minor polish issue, not a blocking concern. You can customize Macroscope's approvability policy. Learn more. |


Adds an env-var escape hatch for the pre-flight overflow check in
_resolve_max_prompt_len. WhenRENDERERS_MAX_PROMPT_LENis set to a positive integer, that value is returned directly and/v1/modelsis not queried.Motivation: routers/gateways whose
/v1/modelshandler is broken (observed with vllm-router v0.1.22 under--intra-node-data-parallel-sizeInvalid values (non-integer, <= 0) are logged and ignored, falling back to the existing auto-discovery path.
Note
Low Risk
Optional env override on client-side pre-flight only; invalid values are ignored with fallback to existing discovery.
Overview
Adds
RENDERERS_MAX_PROMPT_LENso operators can pin the client pre-flight context cap without callingGET /v1/models. When set to a positive integer,_resolve_max_prompt_lenreturns that value immediately and skips engine model-card discovery—useful when/v1/modelsis broken but the realmax_model_lenis known.Invalid values (non-integer or ≤ 0) are logged and ignored; behavior falls back to the existing auto-discovery and cache path. Tests cover override winning over the model card, skipping
/v1/models, and invalid env falling back to discovery.Reviewed by Cursor Bugbot for commit 5f653df. Bugbot is set up for automated code reviews on this repo. Configure here.
Note
Add
RENDERERS_MAX_PROMPT_LENenv var override for pre-flight prompt length cap_max_prompt_len_from_envinrenderers/client.pythat readsRENDERERS_MAX_PROMPT_LENand parses it as a positive integer, logging a warning and returningNonefor invalid values._resolve_max_prompt_lento return the env var value immediately when set, skipping the/v1/modelsHTTP request and the in-memory cache lookup./v1/models.Macroscope summarized 5f653df.