Skip to content

fix(trtllm): propagate max_seq_len from extra-engine-args to MDC registration#7653

Open
AsadShahid04 wants to merge 4 commits intoai-dynamo:mainfrom
AsadShahid04:fix/inf-issue-6987
Open

fix(trtllm): propagate max_seq_len from extra-engine-args to MDC registration#7653
AsadShahid04 wants to merge 4 commits intoai-dynamo:mainfrom
AsadShahid04:fix/inf-issue-6987

Conversation

@AsadShahid04
Copy link
Copy Markdown
Contributor

@AsadShahid04 AsadShahid04 commented Mar 26, 2026

When --extra-engine-args overrides max_seq_len (and other engine config fields), propagate these values back to config so they are correctly reflected in the ModelDeploymentCard (MDC) registration.

This ensures:

  • context_length in MDC matches the engine's actual max_seq_len limit
  • Prompt length validation correctly rejects over-length requests (HTTP 400)
  • Instead of allowing them through and crashing in the engine (HTTP 500)

Also propagates max_batch_size and max_num_tokens for consistency.

Fixes #6987

Summary by CodeRabbit

  • Bug Fixes
    • Fixed configuration override handling to ensure engine limits are correctly propagated when custom configuration options are provided.

…stration

When --extra-engine-args overrides max_seq_len (and other engine config
fields), propagate these values back to config so they are correctly
reflected in the ModelDeploymentCard (MDC) registration.

This ensures:
- context_length in MDC matches the engine's actual max_seq_len limit
- Prompt length validation correctly rejects over-length requests (HTTP 400)
- Instead of allowing them through and crashing in the engine (HTTP 500)

Also propagates max_batch_size and max_num_tokens for consistency.

Fixes ai-dynamo#6987

Signed-off-by: Asad Shahid <asad.shahid@berkeley.edu>
@AsadShahid04 AsadShahid04 requested a review from a team as a code owner March 26, 2026 04:41
@AsadShahid04 AsadShahid04 requested a review from a team March 26, 2026 04:41
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Mar 26, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions
Copy link
Copy Markdown
Contributor

👋 Hi AsadShahid04! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

@github-actions github-actions Bot added external-contribution Pull request is from an external contributor fix backend::trtllm Relates to the trtllm backend labels Mar 26, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 26, 2026

Walkthrough

The code now parses override values from config.extra_engine_args via update_llm_args_with_extra_options and propagates specific fields (max_seq_len, max_batch_size, max_num_tokens) back to the config object with informational logging. This ensures the config values used for ModelDeploymentCard registration reflect the effective engine limits.

Changes

Cohort / File(s) Summary
Engine Args Propagation
components/src/dynamo/trtllm/workers/llm_worker.py
Added logic to parse and propagate override values from extra engine arguments back to config fields (max_seq_len, max_batch_size, max_num_tokens) with informational logging, occurring before other config overrides are applied.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly identifies the specific fix: propagating max_seq_len from extra-engine-args to MDC registration, accurately reflecting the main change in the PR.
Description check ✅ Passed The description covers the problem, solution, and impact. It includes the necessary sections and context for understanding the changes, though it could be slightly more detailed about implementation specifics.
Linked Issues check ✅ Passed The PR directly addresses issue #6987 by implementing the suggested fix: propagating max_seq_len (and max_batch_size, max_num_tokens) from extra-engine-args back to config for MDC registration.
Out of Scope Changes check ✅ Passed All changes are scoped to the identified problem: propagating engine config overrides to the config object before MDC registration, with no extraneous modifications.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
components/src/dynamo/trtllm/workers/llm_worker.py (1)

221-230: ⚠️ Potential issue | 🟠 Major

Consider propagating values from override_engine_args as well.

The PR correctly propagates max_seq_len, max_batch_size, and max_num_tokens from extra_engine_args back to config. However, override_engine_args (applied at lines 222-230) can also modify these same values via deep_update, but no propagation occurs afterward.

If a user sets max_seq_len via --override-engine-args instead of --extra-engine-args, the MDC would still register the wrong context length, causing the same 500-instead-of-400 bug described in issue #6987.

Proposed fix to also propagate after override_engine_args
             deep_update(arg_map, overrides)
+
+            # Propagate any overridden values back to config for MDC registration
+            if "max_seq_len" in overrides:
+                config.max_seq_len = arg_map["max_seq_len"]
+                logging.info(
+                    f"Updated config.max_seq_len from override-engine-args: {config.max_seq_len}"
+                )
+            if "max_batch_size" in overrides:
+                config.max_batch_size = arg_map["max_batch_size"]
+                logging.info(
+                    f"Updated config.max_batch_size from override-engine-args: {config.max_batch_size}"
+                )
+            if "max_num_tokens" in overrides:
+                config.max_num_tokens = arg_map["max_num_tokens"]
+                logging.info(
+                    f"Updated config.max_num_tokens from override-engine-args: {config.max_num_tokens}"
+                )
         except json.JSONDecodeError as e:

Based on learnings: the --override-engine-args flag uses deep_update to merge overrides into the engine arg map, so users may reasonably expect to set these limits via either flag.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/src/dynamo/trtllm/workers/llm_worker.py` around lines 221 - 230,
The override_engine_args block applies deep_update to arg_map but doesn't
propagate updated limits back to the config, so values set via
--override-engine-args (e.g., max_seq_len, max_batch_size, max_num_tokens) are
ignored; after successfully loading and applying overrides in the
override_engine_args branch (where config.override_engine_args, json.loads, and
deep_update(arg_map, overrides) are used), reuse the same propagation logic that
was applied after extra_engine_args to copy relevant values from arg_map into
config (or call the helper that updates config from arg_map), ensuring
max_seq_len, max_batch_size, max_num_tokens (and any other limit fields) are
written back to config so MDC sees the overridden values.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/src/dynamo/trtllm/workers/llm_worker.py`:
- Line 202: Remove the trailing whitespace on the blank line in llm_worker.py
that is causing black to fail; open the module (llm_worker.py) and delete the
extra spaces on the empty line around the LLMWorker module/class area so the
file contains a truly blank line, then re-run black or your formatter to confirm
the whitespace is gone.

---

Outside diff comments:
In `@components/src/dynamo/trtllm/workers/llm_worker.py`:
- Around line 221-230: The override_engine_args block applies deep_update to
arg_map but doesn't propagate updated limits back to the config, so values set
via --override-engine-args (e.g., max_seq_len, max_batch_size, max_num_tokens)
are ignored; after successfully loading and applying overrides in the
override_engine_args branch (where config.override_engine_args, json.loads, and
deep_update(arg_map, overrides) are used), reuse the same propagation logic that
was applied after extra_engine_args to copy relevant values from arg_map into
config (or call the helper that updates config from arg_map), ensuring
max_seq_len, max_batch_size, max_num_tokens (and any other limit fields) are
written back to config so MDC sees the overridden values.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 55b844de-b8be-4e6f-8f65-f703e3df10c5

📥 Commits

Reviewing files that changed from the base of the PR and between a58bcc3 and 5bb3bd4.

📒 Files selected for processing (1)
  • components/src/dynamo/trtllm/workers/llm_worker.py

Comment thread components/src/dynamo/trtllm/workers/llm_worker.py Outdated
Remove trailing whitespace from blank line inside the extra_engine_args
block that caused black formatter to modify the file.

Signed-off-by: Asad Shahid <asad.shahid@berkeley.edu>
…registration

Also propagate max_seq_len, max_batch_size, and max_num_tokens from
override_engine_args back to config after deep_update, matching the
existing propagation done for extra_engine_args. This ensures MDC sees
the correct values regardless of which flag the user sets.

Signed-off-by: Asad Shahid <asad.shahid@berkeley.edu>
@pull-request-size pull-request-size Bot added size/M and removed size/S labels Mar 27, 2026
@AsadShahid04
Copy link
Copy Markdown
Contributor Author

Addressed the CodeRabbit suggestion regarding override_engine_args: added the same propagation logic (for max_seq_len, max_batch_size, max_num_tokens) after the deep_update call in the override_engine_args block. This ensures MDC registration sees the correct values whether the user sets limits via --extra-engine-args or --override-engine-args. Fix in commit 6b040ba.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend::trtllm Relates to the trtllm backend external-contribution Pull request is from an external contributor fix size/M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG]: TRT-LLM --extra-engine-args max_seq_len not propagated to MDC context_length, bypasses prompt length validation

1 participant