SGLang OpenAI API Server returns prompt_token_ids and token_ids#536
Open
pan-x-c wants to merge 16 commits into
Open
SGLang OpenAI API Server returns prompt_token_ids and token_ids#536pan-x-c wants to merge 16 commits into
prompt_token_ids and token_ids#536pan-x-c wants to merge 16 commits into
Conversation
Collaborator
Author
|
/unittest-module-trainer |
Collaborator
Author
|
/unittest-module-trainer |
Collaborator
Author
|
/unittest-module-common |
Summary
Skipped
Tests
Github Test Reporter by CTRF 💚 |
Summary
Skipped
Tests
Github Test Reporter by CTRF 💚 |
Collaborator
Author
|
/unittest-diff |
There was a problem hiding this comment.
Pull request overview
Adds a Trinity-side patch layer for SGLang’s embedded OpenAI-compatible server so chat completion responses can include prompt_token_ids and per-choice token_ids, enabling accurate experience/token recording during rollouts.
Changes:
- Introduces an SGLang OpenAI API monkey patch (request/response models + serving logic) to optionally return token ID fields in non-streaming responses.
- Normalizes
state_dict_metadtype strings (e.g.,float16instead oftorch.float16) and updates consumers accordingly. - Adjusts SGLang engine startup/integration (server factory signature, Hugging Face checkpoint path option) and adds/updates tests and Docker build steps for SGLang.
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| trinity/trainer/verl/megatron_workers.py | Normalizes dtype strings recorded in state_dict_meta. |
| trinity/trainer/verl/fsdp_workers.py | Normalizes dtype strings recorded in state_dict_meta (FSDP/FSDP2). |
| trinity/manager/synchronizer.py | Normalizes dtype strings in meta; adds use_huggingface option to latest model path lookup. |
| trinity/common/models/vllm_worker.py | Aligns dtype parsing with normalized dtype strings. |
| trinity/common/models/sglang_patch/server_patch.py | Adds SGLang OpenAI API monkey patching + refactors embedded server creation. |
| trinity/common/models/sglang_patch/openai_api_patch.py | Implements patched OpenAI protocol/serving to emit prompt_token_ids/token_ids. |
| trinity/common/models/sglang_patch/init.py | Exposes get_api_server from server_patch. |
| trinity/common/models/sglang_model.py | Uses new get_api_server signature; updates weight-sync behavior and checkpoint path usage. |
| trinity/common/models/init.py | Refactors SGLang explorer placement group creation for multiple engines. |
| trinity/common/config.py | Adds multi-node related fields (nnodes, node_rank) to inference config. |
| tests/trainer/trainer_test.py | Updates parameterization to cover different engine types across strategies. |
| tests/common/sglang_test.py | Adds OpenAI-API-level SGLang tests validating experience/token behavior. |
| scripts/docker/Dockerfile.uv | Installs SGLang (and Rust toolchain) in the uv-based Docker image. |
| perf/scripts/explorer/perf_workflow.py | Forces reward=1.0 for perf workflow experiences. |
| .github/workflows/docker/docker-compose.yaml | Bumps unittest image tag used in CI docker-compose workflow. |
Comments suppressed due to low confidence (1)
trinity/common/models/sglang_patch/server_patch.py:14
server_patch.pyrebuilds FastAPI routes using private internals (fastapi.dependencies.utils._should_embed_body_fields,get_flat_dependant, etc.). These APIs are not stable across FastAPI versions and can break the embedded server at runtime. Consider avoiding private FastAPI internals (e.g., re-registering a new APIRoute with the new request model), or at least pin/validate the FastAPI version and fail with a clear error if the expected internals are unavailable.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Summary
Skipped
Tests
Github Test Reporter by CTRF 💚 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
As the title says
Checklist
Please check the following items before code is ready to be reviewed.