Skip to content

Fix worker status for unusable terminal backend responses#615

Open
ushaket wants to merge 4 commits intovllm-project:mainfrom
ushaket:fix/worker-unusable-terminal-response
Open

Fix worker status for unusable terminal backend responses#615
ushaket wants to merge 4 commits intovllm-project:mainfrom
ushaket:fix/worker-unusable-terminal-response

Conversation

@ushaket
Copy link
Copy Markdown
Contributor

@ushaket ushaket commented Feb 26, 2026

Summary

This PR fixes a scheduler correctness bug where requests could be marked as completed even when the backend resolved without a usable terminal response. It adds explicit terminal-response validation in the worker so malformed/empty terminal results are surfaced as errored with a clear diagnostic instead of being counted as successful requests.

Details

  • Added terminal response validation in WorkerProcess to guard final status transitions.
  • Updated request finalization logic so unusable terminal responses set:
    • status: errored
    • error message: [UNUSABLE_BACKEND_RESPONSE] backend resolved without a usable terminal response payload
  • Implemented GenerationResponse-aware usability criteria:
    • usable if non-empty text or output_metrics.total_tokens > 0
    • None terminal response is always unusable
    • non-GenerationResponse fallback remains bool(response) for generic/test compatibility
  • Added regression tests in tests/unit/scheduler/test_worker.py for:
    • no terminal response object -> errored
    • empty GenerationResponse -> errored
    • token-bearing GenerationResponse with empty text -> completed
  • Preserved existing cancellation/error flow behavior outside terminal-response validation.

Test Plan

  • Run targeted worker regression tests:
    • uv run pytest -q tests/unit/scheduler/test_worker.py -k "terminal_response or empty_generation_response or generation_response_with_tokens or invalid_initialization"
  • Verify expected outcomes:
    • missing terminal response is not marked completed
    • empty GenerationResponse is not marked completed
    • non-empty signal (output_tokens > 0) is accepted as completed
  • Confirm no lint issues on touched files.

Related Issues


  • "I certify that all code in this PR is my own, except as noted below."

Use of AI

  • Includes AI-assisted code completion
  • Includes code generated by an AI application
  • Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes ## WRITTEN BY AI ##)

@ushaket ushaket marked this pull request as draft February 26, 2026 09:38
@ushaket
Copy link
Copy Markdown
Contributor Author

ushaket commented Feb 26, 2026

This introduce worker dependency on GenerationResponse, not sure if that's the right way to go,
The reason I added it is that the we need to somehow know what we're looking for, in the future we might have ResponseT that doesn't return text and total_tokens, so if we want to keep this check in the worker, we'll need to be aware of the different ResponseT possible

@sjmonson
Copy link
Copy Markdown
Collaborator

sjmonson commented Mar 2, 2026

Hmm, yeah I am not a fan of depending on GenerationResponse in the worker. Can you maybe just check inside the OpenAIRequestHandler.compile_streaming and OpenAIRequestHandler.compile_non_streaming methods? It does not generalize as nicely but what qualifies as an unusable response may differ between backends/endpoints. If you throw an exception in those methods it should propagate up to the worker.

@ushaket ushaket marked this pull request as ready for review March 2, 2026 21:59
@ushaket
Copy link
Copy Markdown
Contributor Author

ushaket commented Mar 2, 2026

Done

Copy link
Copy Markdown
Collaborator

@sjmonson sjmonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you rebase and get rid of the merge commits?

@dbutenhof dbutenhof added this to the v0.7.0 milestone Mar 30, 2026
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Mar 30, 2026

@ushaket, this project requires a linear history on feature branches.
Your PR contains merge commits. Please rebase your branch against main
and remove them.

You can do this by running:
git pull --rebase upstream main

@mergify mergify Bot added the needs-rebase label Mar 30, 2026
@ushaket ushaket force-pushed the fix/worker-unusable-terminal-response branch from 60e1abc to d35b823 Compare March 30, 2026 13:12
@mergify mergify Bot removed the needs-rebase label Mar 30, 2026
ushaket added 4 commits March 30, 2026 16:16
Signed-off-by: Uri Shaket <ushaket@redhat.com>
Signed-off-by: Uri Shaket <ushaket@redhat.com>
Signed-off-by: Uri Shaket <ushaket@redhat.com>
Signed-off-by: Uri Shaket <ushaket@redhat.com>
@ushaket ushaket force-pushed the fix/worker-unusable-terminal-response branch from d35b823 to d82eb51 Compare March 30, 2026 13:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Worker marks completed/successful when final response has no usable data

3 participants