fix: Support different version of PCG args#517
fix: Support different version of PCG args#517moehanabi wants to merge 1 commit intosgl-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a utility function, adapt_sglang_server_args_kwargs, to handle version-specific argument mapping for sglang's piecewise CUDA graph flags. This ensures compatibility across different sglang versions in SGLangBackendArgs and target model initialization. Feedback was provided regarding the utility function's in-place modification of the input dictionary, suggesting a copy be made to prevent unintended side effects for callers.
| def adapt_sglang_server_args_kwargs(kwargs: Dict[str, Any]) -> Dict[str, Any]: | ||
| """Adapt piecewise cuda graph kwargs for sglang version compatibility. | ||
|
|
||
| New sglang (post-0.5.9) uses 'enforce_piecewise_cuda_graph' (piecewise cuda graph | ||
| is enabled by default, this flag forces it on even when auto-disabled). | ||
| Old sglang (<=0.5.9) uses 'enable_piecewise_cuda_graph' (disabled by default). | ||
|
|
||
| This function translates between the two based on the installed sglang version. | ||
| """ | ||
| from sglang.srt.server_args import ServerArgs | ||
|
|
||
| has_enforce = hasattr(ServerArgs, "enforce_piecewise_cuda_graph") | ||
| has_enable = hasattr(ServerArgs, "enable_piecewise_cuda_graph") | ||
|
|
||
| if "enforce_piecewise_cuda_graph" in kwargs and not has_enforce and has_enable: | ||
| kwargs["enable_piecewise_cuda_graph"] = kwargs.pop( | ||
| "enforce_piecewise_cuda_graph" | ||
| ) | ||
| elif "enable_piecewise_cuda_graph" in kwargs and not has_enable and has_enforce: | ||
| kwargs["enforce_piecewise_cuda_graph"] = kwargs.pop( | ||
| "enable_piecewise_cuda_graph" | ||
| ) | ||
|
|
||
| return kwargs |
There was a problem hiding this comment.
This function modifies the kwargs dictionary in-place, which can lead to unexpected side effects for the caller, especially when **kwargs is passed to a function. It's a better practice to work on a copy of the dictionary to avoid side effects.
I suggest creating a copy of kwargs at the beginning of the function. This makes the function pure with respect to its inputs.
def adapt_sglang_server_args_kwargs(kwargs: Dict[str, Any]) -> Dict[str, Any]:
"""Adapt piecewise cuda graph kwargs for sglang version compatibility.
New sglang (post-0.5.9) uses 'enforce_piecewise_cuda_graph' (piecewise cuda graph
is enabled by default, this flag forces it on even when auto-disabled).
Old sglang (<=0.5.9) uses 'enable_piecewise_cuda_graph' (disabled by default).
This function translates between the two based on the installed sglang version.
"""
kwargs = kwargs.copy()
from sglang.srt.server_args import ServerArgs
has_enforce = hasattr(ServerArgs, "enforce_piecewise_cuda_graph")
has_enable = hasattr(ServerArgs, "enable_piecewise_cuda_graph")
if "enforce_piecewise_cuda_graph" in kwargs and not has_enforce and has_enable:
kwargs["enable_piecewise_cuda_graph"] = kwargs.pop(
"enforce_piecewise_cuda_graph"
)
elif "enable_piecewise_cuda_graph" in kwargs and not has_enable and has_enforce:
kwargs["enforce_piecewise_cuda_graph"] = kwargs.pop(
"enable_piecewise_cuda_graph"
)
return kwargsThere was a problem hiding this comment.
Pull request overview
Adds compatibility handling for the SGLang piecewise CUDA graph flag rename across sglang versions (<=0.5.9 vs post-0.5.9) by adapting kwargs before constructing ServerArgs.
Changes:
- Introduce
adapt_sglang_server_args_kwargs()inspecforge/args.pyto translateenable_piecewise_cuda_graph↔enforce_piecewise_cuda_graphdepending on the installedServerArgsfields. - Apply the adaptation in SGLang-backed target model
from_pretrained()paths (Eagle3 + DFlash). - Route
SGLangBackendArgs.to_kwargs()through the adaptation helper.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| specforge/modeling/target/eagle3_target_model.py | Adapt SGLang ServerArgs kwargs for version compatibility before instantiation. |
| specforge/modeling/target/dflash_target_model.py | Adapt SGLang ServerArgs kwargs for version compatibility before instantiation. |
| specforge/args.py | Add kwarg-adaptation helper and apply it to SGLangBackendArgs.to_kwargs(). |
Comments suppressed due to low confidence (1)
specforge/modeling/target/dflash_target_model.py:90
SGLangDFlashTargetModel.from_pretrainedpassesdtype=torch_dtypedirectly, buttorch_dtypedefaults to None. Elsewhere (e.g.,SGLangEagle3TargetModel) the code notes sglang 0.5.9 requires a non-None dtype and falls back to "auto". Consider applying the same fallback here to avoid failures whentorch_dtypeis not provided.
tp_size = dist.get_world_size(get_tp_group())
kwargs = adapt_sglang_server_args_kwargs(kwargs)
server_args = ServerArgs(
model_path=pretrained_model_name_or_path,
trust_remote_code=trust_remote_code,
dtype=torch_dtype,
enable_return_hidden_states=True, # Critical for DFlash
disable_cuda_graph=True,
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if "enforce_piecewise_cuda_graph" in kwargs and not has_enforce and has_enable: | ||
| kwargs["enable_piecewise_cuda_graph"] = kwargs.pop( | ||
| "enforce_piecewise_cuda_graph" | ||
| ) | ||
| elif "enable_piecewise_cuda_graph" in kwargs and not has_enable and has_enforce: | ||
| kwargs["enforce_piecewise_cuda_graph"] = kwargs.pop( | ||
| "enable_piecewise_cuda_graph" | ||
| ) |
There was a problem hiding this comment.
adapt_sglang_server_args_kwargs can silently overwrite values when both enable_piecewise_cuda_graph and enforce_piecewise_cuda_graph are present in kwargs (e.g., caller passes one manually while another is injected elsewhere). Please handle this explicitly (e.g., define precedence, raise a ValueError, or only convert when the destination key is absent) to avoid surprising behavior.
| if "enforce_piecewise_cuda_graph" in kwargs and not has_enforce and has_enable: | |
| kwargs["enable_piecewise_cuda_graph"] = kwargs.pop( | |
| "enforce_piecewise_cuda_graph" | |
| ) | |
| elif "enable_piecewise_cuda_graph" in kwargs and not has_enable and has_enforce: | |
| kwargs["enforce_piecewise_cuda_graph"] = kwargs.pop( | |
| "enable_piecewise_cuda_graph" | |
| ) | |
| # When only one of these attributes exists on ServerArgs, translate the | |
| # corresponding kwarg name. If both the source and destination kwargs are | |
| # present, avoid silently overwriting and instead either validate or raise. | |
| if "enforce_piecewise_cuda_graph" in kwargs and not has_enforce and has_enable: | |
| # Destination key already present: resolve potential conflict. | |
| if "enable_piecewise_cuda_graph" in kwargs: | |
| src_val = kwargs["enforce_piecewise_cuda_graph"] | |
| dst_val = kwargs["enable_piecewise_cuda_graph"] | |
| if src_val != dst_val: | |
| raise ValueError( | |
| "Both 'enforce_piecewise_cuda_graph' and " | |
| "'enable_piecewise_cuda_graph' were provided with " | |
| f"different values ({src_val!r} vs {dst_val!r}) while " | |
| "only 'enable_piecewise_cuda_graph' is supported by the " | |
| "installed sglang version. Please specify only one or " | |
| "ensure they have the same value." | |
| ) | |
| # Values are equal: drop the unsupported alias to avoid confusion. | |
| kwargs.pop("enforce_piecewise_cuda_graph") | |
| else: | |
| # Only the unsupported name is present: translate it. | |
| kwargs["enable_piecewise_cuda_graph"] = kwargs.pop( | |
| "enforce_piecewise_cuda_graph" | |
| ) | |
| elif "enable_piecewise_cuda_graph" in kwargs and not has_enable and has_enforce: | |
| if "enforce_piecewise_cuda_graph" in kwargs: | |
| src_val = kwargs["enable_piecewise_cuda_graph"] | |
| dst_val = kwargs["enforce_piecewise_cuda_graph"] | |
| if src_val != dst_val: | |
| raise ValueError( | |
| "Both 'enable_piecewise_cuda_graph' and " | |
| "'enforce_piecewise_cuda_graph' were provided with " | |
| f"different values ({src_val!r} vs {dst_val!r}) while " | |
| "only 'enforce_piecewise_cuda_graph' is supported by the " | |
| "installed sglang version. Please specify only one or " | |
| "ensure they have the same value." | |
| ) | |
| kwargs.pop("enable_piecewise_cuda_graph") | |
| else: | |
| kwargs["enforce_piecewise_cuda_graph"] = kwargs.pop( | |
| "enable_piecewise_cuda_graph" | |
| ) |
|
Good question. We are currently adding support for qwen3.5 eagle3 and DFlash model to sglang, and will merge it into the latest sglang as soon as possible. We will update the sglang version that specforge depends on when a stable version of sglang is released. |
Motivation
New sglang (post-0.5.9) uses 'enforce_piecewise_cuda_graph' (piecewise cuda graph is enabled by default, this flag forces it on even when auto-disabled).
Old sglang (<=0.5.9) uses 'enable_piecewise_cuda_graph' (disabled by default).
Modifications
add a function to convert these two args in specforge
PS: actually i don't know what the best way to handle this. maybe requiring a minimun version of sglang (0.5.10) and switch to
enforce_piecewise_cuda_graphis better?Related Issues
Accuracy Test
Benchmark & Profiling
Checklist