Skip to content

[WIP][experimental] multi turn chat benchmark#821

Draft
cquil11 wants to merge 172 commits intomainfrom
experimental/multi-turn-benchmark
Draft

[WIP][experimental] multi turn chat benchmark#821
cquil11 wants to merge 172 commits intomainfrom
experimental/multi-turn-benchmark

Conversation

@cquil11
Copy link
Copy Markdown
Collaborator

@cquil11 cquil11 commented Feb 27, 2026

No description provided.

Rohan138 and others added 30 commits January 26, 2026 17:15
* fix AITER flags for v0.14.0 release

* drop mi325 triton gemm env var

* Add changes to perf changelog
…wont be erroneous negative diff [skip-sweep] (#571)
* remove assign

* initial

* update perf

* fix perf changelog

* trigger test sweep

* trigger test sweep pt 2

* rebase for evals only

* Update perf-changelog.yaml

* remove newline

* update perf changelog

---------

Co-authored-by: Cam Quilici <cjquilici@gmail.com>
* b300 srt slurm

* update generated srtslurm yaml

Signed-off-by: jthomson04 <jothomson@nvidia.com>

* fix image

* add uv and sqsh file

* change partition

* change slurm account

* use regular srt

Signed-off-by: jthomson04 <jothomson@nvidia.com>

* update perf changelog

Signed-off-by: jthomson04 <jothomson@nvidia.com>

* fix runner

Signed-off-by: jthomson04 <jothomson@nvidia.com>

* correct account

Signed-off-by: jthomson04 <jothomson@nvidia.com>

* qos support

Signed-off-by: jthomson04 <jothomson@nvidia.com>

* fix get checkout

Signed-off-by: jthomson04 <jothomson@nvidia.com>

* update runner label and partition

* undo branch checkout

Signed-off-by: jthomson04 <jothomson@nvidia.com>

* debug info

Signed-off-by: jthomson04 <jothomson@nvidia.com>

* cleanup logging

Signed-off-by: jthomson04 <jothomson@nvidia.com>

* use local model dir

Signed-off-by: jthomson04 <jothomson@nvidia.com>

* checkout specific commit

Signed-off-by: jthomson04 <jothomson@nvidia.com>

---------

Signed-off-by: jthomson04 <jothomson@nvidia.com>
Co-authored-by: Sahithi Chigurupati <schigurupati@nvidia.com>
Co-authored-by: Sahithi Chigurupati <chigurupati.sahithi@gmail.com>
…wont be erroneous negative diff [skip-sweep] (#577)
* Update SGLang Docker Image for MI355 to v0.5.8

1. activate FP8 KV cache
2. use the MLA persistent kernel

* Do not activate FP8 KV cache and the MLA persistent kernel explicitly

* Add config-keys (v0.5.5.post3 --> v0.5.8)

* Update perf-changelog.yaml with key fix description for v0.5.8

Add description: Disables mla persistent kernel when not using fp8 kv_cache

Co-authored-by: functionstackx <functionstackx@users.noreply.github.com>

---------

Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: functionstackx <functionstackx@users.noreply.github.com>
Co-authored-by: functionstackx <47992694+functionstackx@users.noreply.github.com>
30s default to 300s
* chore: save server long as artifact after single node runs

* test flaky eval

* test flaky eval

* test flaky eval

* rebase

* rebase pt 2

* add trap to upload server logs on exit

* rebase pt 3

* make server log in gha workspace

* export result filename at runtime so it is present

* revert perf changelog
* chore: add pre-merge check for newline in perf-changelog.yaml

Add a validation step in run-sweep.yml that ensures perf-changelog.yaml
ends with a newline character. This prevents negative diff issues in
subsequent PRs when the file is appended to.

Closes #578

Co-authored-by: Cameron Quilici <cquil11@users.noreply.github.com>

* test

* change logic of newline check

* trigger test check

* remove test perf changelog

---------

Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Cameron Quilici <cquil11@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
cquil11 and others added 6 commits March 13, 2026 15:45
Passes ignore_eos=true in the API request payload to force generating
until max_tokens. Works the same way as AIPerf's --extra-inputs
ignore_eos:true.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
max_active_conversations is divided by num_clients to get per-client
limit. Setting it to 1 with 512 clients gives 0, which fails.
Set to USERS so each client manages 1 conversation at a time.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@cquil11 cquil11 force-pushed the experimental/multi-turn-benchmark branch from 243e96d to f8ca118 Compare March 24, 2026 19:18
cquil11 and others added 23 commits March 25, 2026 12:53
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Loads detailed_results.csv from kv-cache-tester trace replayer
and converts to the same per-request schema used by AIPerf,
enabling Pareto frontier plotting from trace replay sweeps.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- analyze_benchmark_distributions.py now auto-detects trace replay CSV
  format in addition to AIPerf JSONL
- trace replay benchmark script calls the analysis after benchmark

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fixes aggregation failing with "No experiments found" when running
trace replay sweeps. Loads detailed_results.csv from trace_replay/
directory alongside existing AIPerf JSONL and client CSV formats.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…dvancement

- Enable --parallel-subagents in benchmark script
- Change time_scale from 0.1 to 0.05 (2x faster replay)
- Update submodule to feature/parallel-subagents branch
- Exponential distribution for trace advancement (favors turn 0)
- Remove token budget limit (999999999)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- multiturn_fp4_b200_trace_replay.sh: B200 variant with Blackwell arch
  and FP4 compilation config
- multiturn-agentic-trace.yaml: restructured with top-level keys for
  different hardware/model combos (h200-fp8-llama70b, b200-fp4-dsr1)
- multiturn-sweep.yml: added config_key input to select which config
  section to use

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- benchmark-multiturn-tmpl.yml: add ep input, pass as EP_SIZE env var
- multiturn-sweep.yml: add ep input, pass through to template
- multiturn_fp4_b200_trace_replay.sh: add --expert-parallel-size when EP_SIZE > 0

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…llel-size

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Config entries can now specify ep per TP group. Matrix generator passes
ep per entry, falling back to global input. B200 DSR1 config uses
tp4ep4 and tp8ep8.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

5 participants