Skip to content

fix(nemo-gym): plumb max_new_tokens into row-level max_output_tokens#2365

Open
jthomson04 wants to merge 2 commits intoNVIDIA-NeMo:mainfrom
jthomson04:jthomson04/fix-async-rollout-max-seq-len
Open

fix(nemo-gym): plumb max_new_tokens into row-level max_output_tokens#2365
jthomson04 wants to merge 2 commits intoNVIDIA-NeMo:mainfrom
jthomson04:jthomson04/fix-async-rollout-max-seq-len

Conversation

@jthomson04
Copy link
Copy Markdown

Summary

  • run_async_nemo_gym_rollout previously asserted max_seq_len is None and discarded generation_config["max_new_tokens"], leaving every row's responses_create_params with whatever max_output_tokens the agent template happened to set (often unset). With no per-request cap, generations could exceed the configured budget and either run away or get killed by vLLM after wasting compute.
  • Drop the assert max_seq_len is None and document why we accept max_seq_len for API parity without translating it into row-level max_tokens (the semantics differ — total sequence length vs. max new tokens).
  • Write generation_config["max_new_tokens"] into each row's responses_create_params["max_output_tokens"], taking min with any existing per-row value so per-agent overrides still win when stricter.
  • Update the four call sites (grpo train, grpo validate, async trajectory collector, run_grpo_nemo_gym.py example) to pass master_config["policy"]["max_total_sequence_length"] instead of None so the rollout parameter actually reflects the configured cap.
  • Update test_run_async_nemo_gym_rollout to verify max_output_tokens gets populated from generation_config["max_new_tokens"].

Test plan

  • tests/unit/experience/test_rollouts.py::test_run_async_nemo_gym_rollout passes (asserts max_output_tokens is set on every row after the rollout).
  • Run a NeMo-Gym GRPO recipe end-to-end and confirm generations respect policy.generation.max_new_tokens (no runaway sequences) and that the call sites no longer trip the removed assertion.

🤖 Generated with Claude Code

`run_async_nemo_gym_rollout` previously asserted `max_seq_len is None`
and discarded `generation_config["max_new_tokens"]`, leaving every row's
`responses_create_params` with whatever `max_output_tokens` the agent
template happened to set (often unset, leading to runaway generations
that exceeded the configured cap).

Drop the assertion, document why we accept `max_seq_len` for API parity
without translating it (semantics differ from row-level `max_tokens`),
and write `generation_config["max_new_tokens"]` into each row's
`responses_create_params["max_output_tokens"]`, taking the min with any
existing per-row value.

Update the four call sites in grpo / async / nemo-gym example to pass
`master_config["policy"]["max_total_sequence_length"]` instead of `None`
so the rollout parameter actually reflects the configured cap.

Update `test_run_async_nemo_gym_rollout` to verify `max_output_tokens`
gets populated from `generation_config["max_new_tokens"]`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
@jthomson04 jthomson04 requested review from a team as code owners April 29, 2026 21:27
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 29, 2026

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@jthomson04
Copy link
Copy Markdown
Author

/ok to test 1ad93f4

Pre-set one row's max_output_tokens to a looser value than max_new_tokens
to exercise the min() clamp-down branch alongside the existing
no-override branch. Uses max_new_tokens + 1 so the effective generation
cap is unchanged and the rollout's golden token-count assertions still hold.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant