Skip to content

Disable prefix caching for qwen3.5 & glm5 AMD benchmarks#970

Open
functionstackx wants to merge 2 commits intomainfrom
claude/issue-968-20260328-0808
Open

Disable prefix caching for qwen3.5 & glm5 AMD benchmarks#970
functionstackx wants to merge 2 commits intomainfrom
claude/issue-968-20260328-0808

Conversation

@functionstackx
Copy link
Copy Markdown
Contributor

Add --disable-radix-cache to SGLang server launch command in all 7 benchmark scripts listed in #968, rebased on latest main.

Closes #968

Generated with Claude Code

Add --disable-radix-cache to SGLang server launch command in all 7
benchmark scripts listed in #968, rebased on latest main.

Closes #968

Co-authored-by: functionstackx <functionstackx@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

Copy link
Copy Markdown
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — simple, mechanical addition of --disable-radix-cache to 7 benchmark scripts, consistent with the established pattern in this repo (e.g., #966, #926).

Extended reasoning...

Overview\nAdds --disable-radix-cache to the SGLang server launch command in 7 AMD benchmark shell scripts (qwen3.5 bf16/fp8 on MI300X/MI325X/MI355X, and glm5 fp8 on MI355X). Each change is a one-line addition following an existing multi-line argument block.\n\n### Security risks\nNone. These are benchmark configuration scripts with no auth, crypto, or permission-sensitive logic.\n\n### Level of scrutiny\nLow. This is a straightforward config flag addition that mirrors identically structured recent merges (#966 disabling prefix caching for minimax, #926 for kimi). The change is purely additive — no logic branches, no new code paths.\n\n### Other factors\nNo bugs found by the automated system. The PR is self-contained to benchmark scripts with no production code impact. The pattern is well-established in this repo.

@functionstackx
Copy link
Copy Markdown
Contributor Author

@claude add the perf change logs . Yaml

@Klaud-Cold
Copy link
Copy Markdown
Collaborator

Klaud-Cold commented Mar 28, 2026

Claude finished @functionstackx's task in 45s —— View job


Added perf-changelog.yaml entry

  • Gather context on changed files
  • Read current perf-changelog.yaml
  • Add changelog entry for --disable-radix-cache changes
  • Commit and push

Added a perf-changelog.yaml entry at the top of the file covering all 7 affected config keys (qwen3.5-bf16-mi300x-sglang, qwen3.5-bf16-mi325x-sglang, qwen3.5-bf16-mi355x-sglang, qwen3.5-fp8-mi300x-sglang, qwen3.5-fp8-mi325x-sglang, qwen3.5-fp8-mi355x-sglang, glm5-fp8-mi355x-sglang) with PR link to #970.
|

…glm5 AMD benchmarks

Co-authored-by: functionstackx <functionstackx@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

disable prefix on random for qwen3.5 & glm

2 participants