Skip to content

fix(nemo-gym): emit structured response summary on empty generation#2367

Open
jthomson04 wants to merge 2 commits intoNVIDIA-NeMo:mainfrom
jthomson04:jthomson04/fix-nemo-gym-empty-generation-error
Open

fix(nemo-gym): emit structured response summary on empty generation#2367
jthomson04 wants to merge 2 commits intoNVIDIA-NeMo:mainfrom
jthomson04:jthomson04/fix-nemo-gym-empty-generation-error

Conversation

@jthomson04
Copy link
Copy Markdown

Summary

  • When NeMo-Gym returned a result with no generation data, the existing ValueError claimed the prompt exceeded vLLM max_model_len and told the user to bump policy.max_total_sequence_length. That is only one of several possible causes — empty responses also come back from max_output_tokens truncation, upstream errors, refusals, etc. — and the misleading message sent users on a wild goose chase.
  • Replace the canned hint with a structured JSON summary of the response: status, finish reason, incomplete details, error, usage, and a per-output-item breakdown (type, role, status, finish reason, content shape). The actual cause is now visible at the failure site.
  • Helpers (_truncate_error_value, _summarize_nemo_gym_output_item, _summarize_nemo_gym_empty_generation_result) are module-level so they're directly testable.
  • Add unit tests covering the typical truncated-generation case and the malformed-response fallback.

Test plan

  • tests/unit/environments/test_nemo_gym.py::test_summarize_nemo_gym_empty_generation_result_is_diagnosable passes.
  • tests/unit/environments/test_nemo_gym.py::test_summarize_nemo_gym_empty_generation_result_handles_non_dict_response passes.
  • Trigger an empty-generation case (e.g. with a deliberately small max_output_tokens) and confirm the new error message includes incomplete_details / finish_reason / per-output content info.

🤖 Generated with Claude Code

When NeMo-Gym returned a result with no generation data, the existing
`ValueError` claimed the prompt exceeded `vLLM max_model_len` and told
the user to bump `policy.max_total_sequence_length`. This is only one
of several possible causes — the response can also come back empty due
to `max_output_tokens` truncation, an upstream error, or a refusal —
and the misleading message sent users on a wild goose chase.

Replace the canned hint with a structured summary of the response:
status, finish reason, incomplete details, error, usage, and a per-
output-item breakdown (type, role, status, finish reason, content
shape). The full response is JSON-dumped so the actual cause is visible
at the failure site.

Add unit tests for the summarizer that cover the typical truncated-
generation case and the malformed-response fallback.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
@jthomson04 jthomson04 requested review from a team as code owners April 29, 2026 21:28
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 29, 2026

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@jthomson04
Copy link
Copy Markdown
Author

/ok to test 02f609f

@terrykong terrykong requested review from ananthsub, terrykong and yfw May 5, 2026 19:46
The empty-generation summary previously surfaced only response-side
fields. With many concurrent rollouts in flight, an operator also needs
request-side identifiers (model, max_output_tokens, metadata/user) to
pinpoint which call failed.

Add `_summarize_nemo_gym_request_params` and include its output under a
`request` key in both the well-formed and malformed-response branches
of `_summarize_nemo_gym_empty_generation_result`. Extend the existing
tests to cover the new fields and the malformed-response path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant