fix(nemo-gym): plumb max_new_tokens into row-level max_output_tokens by jthomson04 · Pull Request #2365 · NVIDIA-NeMo/RL

jthomson04 · 2026-04-29T21:27:54Z

Summary

run_async_nemo_gym_rollout previously asserted max_seq_len is None and discarded generation_config["max_new_tokens"], leaving every row's responses_create_params with whatever max_output_tokens the agent template happened to set (often unset). With no per-request cap, generations could exceed the configured budget and either run away or get killed by vLLM after wasting compute.
Drop the assert max_seq_len is None and document why we accept max_seq_len for API parity without translating it into row-level max_tokens (the semantics differ — total sequence length vs. max new tokens).
Write generation_config["max_new_tokens"] into each row's responses_create_params["max_output_tokens"], taking min with any existing per-row value so per-agent overrides still win when stricter.
Update the four call sites (grpo train, grpo validate, async trajectory collector, run_grpo_nemo_gym.py example) to pass master_config["policy"]["max_total_sequence_length"] instead of None so the rollout parameter actually reflects the configured cap.
Update test_run_async_nemo_gym_rollout to verify max_output_tokens gets populated from generation_config["max_new_tokens"].

Test plan

tests/unit/experience/test_rollouts.py::test_run_async_nemo_gym_rollout passes (asserts max_output_tokens is set on every row after the rollout).
Run a NeMo-Gym GRPO recipe end-to-end and confirm generations respect policy.generation.max_new_tokens (no runaway sequences) and that the call sites no longer trip the removed assertion.

🤖 Generated with Claude Code

`run_async_nemo_gym_rollout` previously asserted `max_seq_len is None` and discarded `generation_config["max_new_tokens"]`, leaving every row's `responses_create_params` with whatever `max_output_tokens` the agent template happened to set (often unset, leading to runaway generations that exceeded the configured cap). Drop the assertion, document why we accept `max_seq_len` for API parity without translating it (semantics differ from row-level `max_tokens`), and write `generation_config["max_new_tokens"]` into each row's `responses_create_params["max_output_tokens"]`, taking the min with any existing per-row value. Update the four call sites in grpo / async / nemo-gym example to pass `master_config["policy"]["max_total_sequence_length"]` instead of `None` so the rollout parameter actually reflects the configured cap. Update `test_run_async_nemo_gym_rollout` to verify `max_output_tokens` gets populated from `generation_config["max_new_tokens"]`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: jthomson04 <jwillthomson19@gmail.com>

copy-pr-bot · 2026-04-29T21:27:58Z

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

jthomson04 · 2026-04-29T21:56:26Z

/ok to test 1ad93f4

Pre-set one row's max_output_tokens to a looser value than max_new_tokens to exercise the min() clamp-down branch alongside the existing no-override branch. Uses max_new_tokens + 1 so the effective generation cap is unchanged and the rollout's golden token-count assertions still hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: jthomson04 <jwillthomson19@gmail.com>

jthomson04 requested review from a team as code owners April 29, 2026 21:27

terrykong requested review from ananthsub, terrykong and yuki-97 May 5, 2026 19:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(nemo-gym): plumb max_new_tokens into row-level max_output_tokens#2365

fix(nemo-gym): plumb max_new_tokens into row-level max_output_tokens#2365
jthomson04 wants to merge 2 commits intoNVIDIA-NeMo:mainfrom
jthomson04:jthomson04/fix-async-rollout-max-seq-len

jthomson04 commented Apr 29, 2026

Uh oh!

copy-pr-bot Bot commented Apr 29, 2026

Uh oh!

jthomson04 commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jthomson04 commented Apr 29, 2026

Summary

Test plan

Uh oh!

copy-pr-bot Bot commented Apr 29, 2026

Uh oh!

jthomson04 commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant