Skip to content

fix: Support different version of PCG args#517

Open
moehanabi wants to merge 1 commit intosgl-project:mainfrom
moehanabi:fix_pcg_args
Open

fix: Support different version of PCG args#517
moehanabi wants to merge 1 commit intosgl-project:mainfrom
moehanabi:fix_pcg_args

Conversation

@moehanabi
Copy link
Copy Markdown
Contributor

@moehanabi moehanabi commented Mar 30, 2026

Motivation

New sglang (post-0.5.9) uses 'enforce_piecewise_cuda_graph' (piecewise cuda graph is enabled by default, this flag forces it on even when auto-disabled).

Old sglang (<=0.5.9) uses 'enable_piecewise_cuda_graph' (disabled by default).

Modifications

add a function to convert these two args in specforge

PS: actually i don't know what the best way to handle this. maybe requiring a minimun version of sglang (0.5.10) and switch to enforce_piecewise_cuda_graph is better?

Related Issues

Accuracy Test

Benchmark & Profiling

Checklist

@moehanabi moehanabi requested a review from FlamingoPg as a code owner March 30, 2026 12:33
Copilot AI review requested due to automatic review settings March 30, 2026 12:33
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a utility function, adapt_sglang_server_args_kwargs, to handle version-specific argument mapping for sglang's piecewise CUDA graph flags. This ensures compatibility across different sglang versions in SGLangBackendArgs and target model initialization. Feedback was provided regarding the utility function's in-place modification of the input dictionary, suggesting a copy be made to prevent unintended side effects for callers.

Comment on lines +8 to +31
def adapt_sglang_server_args_kwargs(kwargs: Dict[str, Any]) -> Dict[str, Any]:
"""Adapt piecewise cuda graph kwargs for sglang version compatibility.

New sglang (post-0.5.9) uses 'enforce_piecewise_cuda_graph' (piecewise cuda graph
is enabled by default, this flag forces it on even when auto-disabled).
Old sglang (<=0.5.9) uses 'enable_piecewise_cuda_graph' (disabled by default).

This function translates between the two based on the installed sglang version.
"""
from sglang.srt.server_args import ServerArgs

has_enforce = hasattr(ServerArgs, "enforce_piecewise_cuda_graph")
has_enable = hasattr(ServerArgs, "enable_piecewise_cuda_graph")

if "enforce_piecewise_cuda_graph" in kwargs and not has_enforce and has_enable:
kwargs["enable_piecewise_cuda_graph"] = kwargs.pop(
"enforce_piecewise_cuda_graph"
)
elif "enable_piecewise_cuda_graph" in kwargs and not has_enable and has_enforce:
kwargs["enforce_piecewise_cuda_graph"] = kwargs.pop(
"enable_piecewise_cuda_graph"
)

return kwargs
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This function modifies the kwargs dictionary in-place, which can lead to unexpected side effects for the caller, especially when **kwargs is passed to a function. It's a better practice to work on a copy of the dictionary to avoid side effects.

I suggest creating a copy of kwargs at the beginning of the function. This makes the function pure with respect to its inputs.

def adapt_sglang_server_args_kwargs(kwargs: Dict[str, Any]) -> Dict[str, Any]:
    """Adapt piecewise cuda graph kwargs for sglang version compatibility.

    New sglang (post-0.5.9) uses 'enforce_piecewise_cuda_graph' (piecewise cuda graph
    is enabled by default, this flag forces it on even when auto-disabled).
    Old sglang (<=0.5.9) uses 'enable_piecewise_cuda_graph' (disabled by default).

    This function translates between the two based on the installed sglang version.
    """
    kwargs = kwargs.copy()
    from sglang.srt.server_args import ServerArgs

    has_enforce = hasattr(ServerArgs, "enforce_piecewise_cuda_graph")
    has_enable = hasattr(ServerArgs, "enable_piecewise_cuda_graph")

    if "enforce_piecewise_cuda_graph" in kwargs and not has_enforce and has_enable:
        kwargs["enable_piecewise_cuda_graph"] = kwargs.pop(
            "enforce_piecewise_cuda_graph"
        )
    elif "enable_piecewise_cuda_graph" in kwargs and not has_enable and has_enforce:
        kwargs["enforce_piecewise_cuda_graph"] = kwargs.pop(
            "enable_piecewise_cuda_graph"
        )

    return kwargs

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds compatibility handling for the SGLang piecewise CUDA graph flag rename across sglang versions (<=0.5.9 vs post-0.5.9) by adapting kwargs before constructing ServerArgs.

Changes:

  • Introduce adapt_sglang_server_args_kwargs() in specforge/args.py to translate enable_piecewise_cuda_graphenforce_piecewise_cuda_graph depending on the installed ServerArgs fields.
  • Apply the adaptation in SGLang-backed target model from_pretrained() paths (Eagle3 + DFlash).
  • Route SGLangBackendArgs.to_kwargs() through the adaptation helper.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
specforge/modeling/target/eagle3_target_model.py Adapt SGLang ServerArgs kwargs for version compatibility before instantiation.
specforge/modeling/target/dflash_target_model.py Adapt SGLang ServerArgs kwargs for version compatibility before instantiation.
specforge/args.py Add kwarg-adaptation helper and apply it to SGLangBackendArgs.to_kwargs().
Comments suppressed due to low confidence (1)

specforge/modeling/target/dflash_target_model.py:90

  • SGLangDFlashTargetModel.from_pretrained passes dtype=torch_dtype directly, but torch_dtype defaults to None. Elsewhere (e.g., SGLangEagle3TargetModel) the code notes sglang 0.5.9 requires a non-None dtype and falls back to "auto". Consider applying the same fallback here to avoid failures when torch_dtype is not provided.
        tp_size = dist.get_world_size(get_tp_group())
        kwargs = adapt_sglang_server_args_kwargs(kwargs)
        server_args = ServerArgs(
            model_path=pretrained_model_name_or_path,
            trust_remote_code=trust_remote_code,
            dtype=torch_dtype,
            enable_return_hidden_states=True,  # Critical for DFlash
            disable_cuda_graph=True,

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +22 to +29
if "enforce_piecewise_cuda_graph" in kwargs and not has_enforce and has_enable:
kwargs["enable_piecewise_cuda_graph"] = kwargs.pop(
"enforce_piecewise_cuda_graph"
)
elif "enable_piecewise_cuda_graph" in kwargs and not has_enable and has_enforce:
kwargs["enforce_piecewise_cuda_graph"] = kwargs.pop(
"enable_piecewise_cuda_graph"
)
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adapt_sglang_server_args_kwargs can silently overwrite values when both enable_piecewise_cuda_graph and enforce_piecewise_cuda_graph are present in kwargs (e.g., caller passes one manually while another is injected elsewhere). Please handle this explicitly (e.g., define precedence, raise a ValueError, or only convert when the destination key is absent) to avoid surprising behavior.

Suggested change
if "enforce_piecewise_cuda_graph" in kwargs and not has_enforce and has_enable:
kwargs["enable_piecewise_cuda_graph"] = kwargs.pop(
"enforce_piecewise_cuda_graph"
)
elif "enable_piecewise_cuda_graph" in kwargs and not has_enable and has_enforce:
kwargs["enforce_piecewise_cuda_graph"] = kwargs.pop(
"enable_piecewise_cuda_graph"
)
# When only one of these attributes exists on ServerArgs, translate the
# corresponding kwarg name. If both the source and destination kwargs are
# present, avoid silently overwriting and instead either validate or raise.
if "enforce_piecewise_cuda_graph" in kwargs and not has_enforce and has_enable:
# Destination key already present: resolve potential conflict.
if "enable_piecewise_cuda_graph" in kwargs:
src_val = kwargs["enforce_piecewise_cuda_graph"]
dst_val = kwargs["enable_piecewise_cuda_graph"]
if src_val != dst_val:
raise ValueError(
"Both 'enforce_piecewise_cuda_graph' and "
"'enable_piecewise_cuda_graph' were provided with "
f"different values ({src_val!r} vs {dst_val!r}) while "
"only 'enable_piecewise_cuda_graph' is supported by the "
"installed sglang version. Please specify only one or "
"ensure they have the same value."
)
# Values are equal: drop the unsupported alias to avoid confusion.
kwargs.pop("enforce_piecewise_cuda_graph")
else:
# Only the unsupported name is present: translate it.
kwargs["enable_piecewise_cuda_graph"] = kwargs.pop(
"enforce_piecewise_cuda_graph"
)
elif "enable_piecewise_cuda_graph" in kwargs and not has_enable and has_enforce:
if "enforce_piecewise_cuda_graph" in kwargs:
src_val = kwargs["enable_piecewise_cuda_graph"]
dst_val = kwargs["enforce_piecewise_cuda_graph"]
if src_val != dst_val:
raise ValueError(
"Both 'enable_piecewise_cuda_graph' and "
"'enforce_piecewise_cuda_graph' were provided with "
f"different values ({src_val!r} vs {dst_val!r}) while "
"only 'enforce_piecewise_cuda_graph' is supported by the "
"installed sglang version. Please specify only one or "
"ensure they have the same value."
)
kwargs.pop("enable_piecewise_cuda_graph")
else:
kwargs["enforce_piecewise_cuda_graph"] = kwargs.pop(
"enable_piecewise_cuda_graph"
)

Copilot uses AI. Check for mistakes.
@jiapingW
Copy link
Copy Markdown
Collaborator

Good question. We are currently adding support for qwen3.5 eagle3 and DFlash model to sglang, and will merge it into the latest sglang as soon as possible. We will update the sglang version that specforge depends on when a stable version of sglang is released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants