fix(trtllm): propagate max_seq_len from extra-engine-args to MDC registration#7653
fix(trtllm): propagate max_seq_len from extra-engine-args to MDC registration#7653AsadShahid04 wants to merge 4 commits intoai-dynamo:mainfrom
Conversation
…stration When --extra-engine-args overrides max_seq_len (and other engine config fields), propagate these values back to config so they are correctly reflected in the ModelDeploymentCard (MDC) registration. This ensures: - context_length in MDC matches the engine's actual max_seq_len limit - Prompt length validation correctly rejects over-length requests (HTTP 400) - Instead of allowing them through and crashing in the engine (HTTP 500) Also propagates max_batch_size and max_num_tokens for consistency. Fixes ai-dynamo#6987 Signed-off-by: Asad Shahid <asad.shahid@berkeley.edu>
|
👋 Hi AsadShahid04! Thank you for contributing to ai-dynamo/dynamo. Just a reminder: The 🚀 |
WalkthroughThe code now parses override values from Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
components/src/dynamo/trtllm/workers/llm_worker.py (1)
221-230:⚠️ Potential issue | 🟠 MajorConsider propagating values from
override_engine_argsas well.The PR correctly propagates
max_seq_len,max_batch_size, andmax_num_tokensfromextra_engine_argsback to config. However,override_engine_args(applied at lines 222-230) can also modify these same values viadeep_update, but no propagation occurs afterward.If a user sets
max_seq_lenvia--override-engine-argsinstead of--extra-engine-args, the MDC would still register the wrong context length, causing the same 500-instead-of-400 bug described in issue#6987.Proposed fix to also propagate after override_engine_args
deep_update(arg_map, overrides) + + # Propagate any overridden values back to config for MDC registration + if "max_seq_len" in overrides: + config.max_seq_len = arg_map["max_seq_len"] + logging.info( + f"Updated config.max_seq_len from override-engine-args: {config.max_seq_len}" + ) + if "max_batch_size" in overrides: + config.max_batch_size = arg_map["max_batch_size"] + logging.info( + f"Updated config.max_batch_size from override-engine-args: {config.max_batch_size}" + ) + if "max_num_tokens" in overrides: + config.max_num_tokens = arg_map["max_num_tokens"] + logging.info( + f"Updated config.max_num_tokens from override-engine-args: {config.max_num_tokens}" + ) except json.JSONDecodeError as e:Based on learnings: the
--override-engine-argsflag usesdeep_updateto merge overrides into the engine arg map, so users may reasonably expect to set these limits via either flag.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@components/src/dynamo/trtllm/workers/llm_worker.py` around lines 221 - 230, The override_engine_args block applies deep_update to arg_map but doesn't propagate updated limits back to the config, so values set via --override-engine-args (e.g., max_seq_len, max_batch_size, max_num_tokens) are ignored; after successfully loading and applying overrides in the override_engine_args branch (where config.override_engine_args, json.loads, and deep_update(arg_map, overrides) are used), reuse the same propagation logic that was applied after extra_engine_args to copy relevant values from arg_map into config (or call the helper that updates config from arg_map), ensuring max_seq_len, max_batch_size, max_num_tokens (and any other limit fields) are written back to config so MDC sees the overridden values.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@components/src/dynamo/trtllm/workers/llm_worker.py`:
- Line 202: Remove the trailing whitespace on the blank line in llm_worker.py
that is causing black to fail; open the module (llm_worker.py) and delete the
extra spaces on the empty line around the LLMWorker module/class area so the
file contains a truly blank line, then re-run black or your formatter to confirm
the whitespace is gone.
---
Outside diff comments:
In `@components/src/dynamo/trtllm/workers/llm_worker.py`:
- Around line 221-230: The override_engine_args block applies deep_update to
arg_map but doesn't propagate updated limits back to the config, so values set
via --override-engine-args (e.g., max_seq_len, max_batch_size, max_num_tokens)
are ignored; after successfully loading and applying overrides in the
override_engine_args branch (where config.override_engine_args, json.loads, and
deep_update(arg_map, overrides) are used), reuse the same propagation logic that
was applied after extra_engine_args to copy relevant values from arg_map into
config (or call the helper that updates config from arg_map), ensuring
max_seq_len, max_batch_size, max_num_tokens (and any other limit fields) are
written back to config so MDC sees the overridden values.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 55b844de-b8be-4e6f-8f65-f703e3df10c5
📒 Files selected for processing (1)
components/src/dynamo/trtllm/workers/llm_worker.py
Remove trailing whitespace from blank line inside the extra_engine_args block that caused black formatter to modify the file. Signed-off-by: Asad Shahid <asad.shahid@berkeley.edu>
…registration Also propagate max_seq_len, max_batch_size, and max_num_tokens from override_engine_args back to config after deep_update, matching the existing propagation done for extra_engine_args. This ensures MDC sees the correct values regardless of which flag the user sets. Signed-off-by: Asad Shahid <asad.shahid@berkeley.edu>
|
Addressed the CodeRabbit suggestion regarding |
When --extra-engine-args overrides max_seq_len (and other engine config fields), propagate these values back to config so they are correctly reflected in the ModelDeploymentCard (MDC) registration.
This ensures:
Also propagates max_batch_size and max_num_tokens for consistency.
Fixes #6987
Summary by CodeRabbit