fix(nemo-gym): plumb max_new_tokens into row-level max_output_tokens#2365
Open
jthomson04 wants to merge 2 commits intoNVIDIA-NeMo:mainfrom
Open
fix(nemo-gym): plumb max_new_tokens into row-level max_output_tokens#2365jthomson04 wants to merge 2 commits intoNVIDIA-NeMo:mainfrom
jthomson04 wants to merge 2 commits intoNVIDIA-NeMo:mainfrom
Conversation
`run_async_nemo_gym_rollout` previously asserted `max_seq_len is None` and discarded `generation_config["max_new_tokens"]`, leaving every row's `responses_create_params` with whatever `max_output_tokens` the agent template happened to set (often unset, leading to runaway generations that exceeded the configured cap). Drop the assertion, document why we accept `max_seq_len` for API parity without translating it (semantics differ from row-level `max_tokens`), and write `generation_config["max_new_tokens"]` into each row's `responses_create_params["max_output_tokens"]`, taking the min with any existing per-row value. Update the four call sites in grpo / async / nemo-gym example to pass `master_config["policy"]["max_total_sequence_length"]` instead of `None` so the rollout parameter actually reflects the configured cap. Update `test_run_async_nemo_gym_rollout` to verify `max_output_tokens` gets populated from `generation_config["max_new_tokens"]`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
|
Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
Author
|
/ok to test 1ad93f4 |
Pre-set one row's max_output_tokens to a looser value than max_new_tokens to exercise the min() clamp-down branch alongside the existing no-override branch. Uses max_new_tokens + 1 so the effective generation cap is unchanged and the rollout's golden token-count assertions still hold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
run_async_nemo_gym_rolloutpreviously assertedmax_seq_len is Noneand discardedgeneration_config["max_new_tokens"], leaving every row'sresponses_create_paramswith whatevermax_output_tokensthe agent template happened to set (often unset). With no per-request cap, generations could exceed the configured budget and either run away or get killed by vLLM after wasting compute.assert max_seq_len is Noneand document why we acceptmax_seq_lenfor API parity without translating it into row-levelmax_tokens(the semantics differ — total sequence length vs. max new tokens).generation_config["max_new_tokens"]into each row'sresponses_create_params["max_output_tokens"], takingminwith any existing per-row value so per-agent overrides still win when stricter.run_grpo_nemo_gym.pyexample) to passmaster_config["policy"]["max_total_sequence_length"]instead ofNoneso the rollout parameter actually reflects the configured cap.test_run_async_nemo_gym_rolloutto verifymax_output_tokensgets populated fromgeneration_config["max_new_tokens"].Test plan
tests/unit/experience/test_rollouts.py::test_run_async_nemo_gym_rolloutpasses (assertsmax_output_tokensis set on every row after the rollout).policy.generation.max_new_tokens(no runaway sequences) and that the call sites no longer trip the removed assertion.🤖 Generated with Claude Code