Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
5c26459
feat(metrics): add registry, samplers, and snapshot wire schema
nv-alicheng May 5, 2026
186fc18
feat(metrics): add MetricsPublisher and MetricsSnapshotSubscriber
nv-alicheng May 5, 2026
6ca9b65
refactor(metrics): wire pub/sub into aggregator, remove KVStore + mmap
nv-alicheng May 5, 2026
76e51f6
refactor(load_generator): emit ERROR before COMPLETE for failed queries
nv-alicheng May 5, 2026
953c3b7
docs(agents): update AGENTS.md for metrics pub/sub refactor
nv-alicheng May 5, 2026
2e804c0
fix(metrics): address P0 review-council findings
nv-alicheng May 6, 2026
236928b
test(metrics): rewrite skipped suites on registry/snapshot fixtures
nv-alicheng May 5, 2026
22f5912
test(templates): unblock TestTemplateIntegration without HF_TOKEN
nv-alicheng May 5, 2026
e13bbee
docs(agents): add reference-hygiene rules + clean up violations
nv-alicheng May 6, 2026
11860df
refactor(metrics_table): encapsulate in-flight task access
nv-alicheng May 11, 2026
0bdd391
refactor(metrics): rename refresh_hz → publish_interval_s (seconds)
nv-alicheng May 11, 2026
d4e2655
fix(report): numerically stable variance for integer-aggregate series
nv-alicheng May 11, 2026
37ff68f
fix(metrics): drain / shutdown correctness for cancel + SIGTERM
nv-alicheng May 11, 2026
3ef6936
fix(metrics): pre-check HDR `high >= 2*low` before HdrHistogram ctor
nv-alicheng May 11, 2026
d5cfee2
fix(metrics): guard publisher.start against double-STARTED
nv-alicheng May 11, 2026
c8d3860
fix(metrics): structured logging around aggregator subprocess crash
nv-alicheng May 11, 2026
ac4c3dd
docs(session): document publish-order invariant for ERROR-before-COMP…
nv-alicheng May 11, 2026
8169ec6
feat(metrics): --drain-timeout flag, default bumped to 60s
nv-alicheng May 11, 2026
5d68890
feat(metrics): add SessionState.INITIALIZE for pre-START phase
nv-alicheng May 11, 2026
2a15269
test(registry): cover SeriesSampler internal boundaries
nv-alicheng May 11, 2026
bb432ea
feat(metrics): add SessionState.INTERRUPTED for signal-handler shutdown
nv-alicheng May 12, 2026
c432fda
refactor(metrics): final snapshot = JSON file; pub/sub = TUI signal
nv-alicheng May 12, 2026
a97ac4a
fix(metrics): duplicate STARTED → log error + preserve session_start
nv-alicheng May 12, 2026
b26ce9f
fix(metrics): scrub NaN/Inf to None in snapshot_to_dict; allow_nan=False
nv-alicheng May 12, 2026
8fc4eed
fix(metrics): SIGTERM refresh duration; SIGINT no-op
nv-alicheng May 12, 2026
5c2e866
test(execute): cover _load_final_snapshot_from_disk + Report fallback
nv-alicheng May 12, 2026
d247158
fix(metrics): doc cleanup + contract enforcement (p50, output_dir)
nv-alicheng May 12, 2026
f73f767
fix(metrics): robustness — KeyboardInterrupt, finalize, .tmp cleanup
nv-alicheng May 12, 2026
8f14e9e
chore(tests): rename tests/datasets/ → tests/assets/datasets/
nv-alicheng May 12, 2026
924be7a
test(integration): use local char-tokenizer fixture, drop HF Hub depe…
nv-alicheng May 12, 2026
a95ec22
test(integration): bump worker_initialization_timeout to 120s in CI
nv-alicheng May 12, 2026
8fe86bf
fix(metrics): drain pending count, SIGTERM task GC, NaN display
nv-alicheng May 13, 2026
8c6840c
style(tests): hoist lazy imports to top of test_report_builder
nv-alicheng May 13, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
145 changes: 117 additions & 28 deletions AGENTS.md

Large diffs are not rendered by default.

6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,13 +40,13 @@ uv run inference-endpoint probe \
uv run inference-endpoint benchmark offline \
--endpoints http://your-endpoint:8000 \
--model Qwen/Qwen3-8B \
--dataset tests/datasets/dummy_1k.jsonl
--dataset tests/assets/datasets/dummy_1k.jsonl

# Run online benchmark (sustained QPS)
uv run inference-endpoint benchmark online \
--endpoints http://your-endpoint:8000 \
--model Qwen/Qwen3-8B \
--dataset tests/datasets/dummy_1k.jsonl \
--dataset tests/assets/datasets/dummy_1k.jsonl \
--load-pattern poisson \
--target-qps 100
```
Expand All @@ -59,7 +59,7 @@ uv run python -m inference_endpoint.testing.echo_server --port 8765 &
uv run inference-endpoint benchmark offline \
--endpoints http://localhost:8765 \
--model test-model \
--dataset tests/datasets/dummy_1k.jsonl
--dataset tests/assets/datasets/dummy_1k.jsonl
pkill -f echo_server
```

Expand Down
10 changes: 5 additions & 5 deletions docs/CLI_QUICK_REFERENCE.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,13 @@ Command-line reference for all `inference-endpoint` subcommands, flags, load pat
inference-endpoint benchmark offline \
--endpoints URL \
--model Qwen/Qwen3-8B \
--dataset tests/datasets/dummy_1k.jsonl
--dataset tests/assets/datasets/dummy_1k.jsonl

# Online (sustained QPS - requires --load-pattern, --target-qps)
inference-endpoint benchmark online \
--endpoints URL \
--model Qwen/Qwen3-8B \
--dataset tests/datasets/dummy_1k.jsonl \
--dataset tests/assets/datasets/dummy_1k.jsonl \
--load-pattern poisson \
--target-qps 100

Expand All @@ -35,14 +35,14 @@ inference-endpoint benchmark offline \
inference-endpoint benchmark offline \
--endpoints URL \
--model Qwen/Qwen3-8B \
--dataset tests/datasets/dummy_1k.jsonl \
--dataset tests/assets/datasets/dummy_1k.jsonl \
--report-dir my_benchmark_report

# YAML-based
inference-endpoint benchmark from-config --config test.yaml
```

**Default Test Dataset:** Use `tests/datasets/dummy_1k.jsonl` (1000 samples) for local testing.
**Default Test Dataset:** Use `tests/assets/datasets/dummy_1k.jsonl` (1000 samples) for local testing.

**Dataset format:** `--dataset [perf|acc:]<path>[,key=value...]` — TOML-style dotted paths. Type prefix is optional (defaults to `perf`):

Expand Down Expand Up @@ -200,7 +200,7 @@ inference-endpoint benchmark offline \
inference-endpoint benchmark offline \
--endpoints http://localhost:8000 \
--model Qwen/Qwen3-8B \
--dataset tests/datasets/dummy_1k.jsonl
--dataset tests/assets/datasets/dummy_1k.jsonl
```

### Production Benchmark
Expand Down
24 changes: 12 additions & 12 deletions docs/LOCAL_TESTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ How to run and test the CLI locally using the built-in echo server and the inclu

### 1. Prepare Test Environment

**Dataset:** The repo includes `tests/datasets/dummy_1k.jsonl` (1000 samples)
**Dataset:** The repo includes `tests/assets/datasets/dummy_1k.jsonl` (1000 samples)
**Format:** Automatically inferred from the file extension. Common local formats include `jsonl`, `json`, `csv`, `parquet`, and HuggingFace datasets.

### 2. Start the Echo Server
Expand Down Expand Up @@ -74,14 +74,14 @@ Waiting for 5 responses...
uv run inference-endpoint -v benchmark offline \
--endpoints http://localhost:8765 \
--model Qwen/Qwen3-8B \
--dataset tests/datasets/dummy_1k.jsonl \
--dataset tests/assets/datasets/dummy_1k.jsonl \
--duration 0

# Production test with custom params and report generation
uv run inference-endpoint -v benchmark offline \
--endpoints http://localhost:8765 \
--model Qwen/Qwen3-8B \
--dataset tests/datasets/dummy_1k.jsonl \
--dataset tests/assets/datasets/dummy_1k.jsonl \
--num-samples 5000 \
--workers 4 \
--report-dir benchmark_report
Expand Down Expand Up @@ -114,7 +114,7 @@ Cleaning up...
uv run inference-endpoint -v benchmark online \
--endpoints http://localhost:8765 \
--model Qwen/Qwen3-8B \
--dataset tests/datasets/dummy_1k.jsonl \
--dataset tests/assets/datasets/dummy_1k.jsonl \
--duration 0 \
--load-pattern poisson \
--target-qps 100 \
Expand Down Expand Up @@ -154,7 +154,7 @@ uv run inference-endpoint validate-yaml --config offline_template.yaml
uv run inference-endpoint benchmark offline \
--endpoints http://localhost:8765 \
--model Qwen/Qwen3-8B \
--dataset tests/datasets/ds_samples.jsonl \
--dataset tests/assets/datasets/ds_samples.jsonl \
-v
```

Expand Down Expand Up @@ -246,7 +246,7 @@ uv run inference-endpoint probe --endpoints http://localhost:8000 --model Qwen/Q
uv run inference-endpoint -v benchmark offline \
--endpoints http://localhost:8000 \
--model Qwen/Qwen3-8B \
--dataset tests/datasets/dummy_1k.jsonl \
--dataset tests/assets/datasets/dummy_1k.jsonl \
--workers 4 \
--report-dir benchmark_report

Expand All @@ -261,14 +261,14 @@ pkill -f echo_server
uv run inference-endpoint benchmark offline \
--endpoints http://localhost:8765 \
--model Qwen/Qwen3-8B \
--dataset tests/datasets/dummy_1k.jsonl \
--dataset tests/assets/datasets/dummy_1k.jsonl \
--report-dir offline_report

# Online (Poisson distribution)
uv run inference-endpoint benchmark online \
--endpoints http://localhost:8765 \
--model Qwen/Qwen3-8B \
--dataset tests/datasets/dummy_1k.jsonl \
--dataset tests/assets/datasets/dummy_1k.jsonl \
--load-pattern poisson \
--target-qps 500 \
--report-dir online_report
Expand All @@ -277,21 +277,21 @@ uv run inference-endpoint benchmark online \
uv run inference-endpoint benchmark offline \
--endpoints http://localhost:8765 \
--model Qwen/Qwen3-8B \
--dataset tests/datasets/dummy_1k.jsonl \
--dataset tests/assets/datasets/dummy_1k.jsonl \
--num-samples 500

# Force streaming on for offline mode (to test TTFT metrics)
uv run inference-endpoint benchmark offline \
--endpoints http://localhost:8765 \
--model Qwen/Qwen3-8B \
--dataset tests/datasets/dummy_1k.jsonl \
--dataset tests/assets/datasets/dummy_1k.jsonl \
--streaming on

# Concurrency mode (fixed concurrent requests)
uv run inference-endpoint benchmark online \
--endpoints http://localhost:8765 \
--model Qwen/Qwen3-8B \
--dataset tests/datasets/dummy_1k.jsonl \
--dataset tests/assets/datasets/dummy_1k.jsonl \
--load-pattern concurrency \
--concurrency 32
```
Expand All @@ -317,7 +317,7 @@ uv run inference-endpoint benchmark online \
- Use `-v` for INFO logging, `-vv` for DEBUG
- Echo server mirrors prompts back - perfect for quick testing without real inference
- Press `Ctrl+C` to gracefully interrupt benchmarks
- Default test dataset: `tests/datasets/dummy_1k.jsonl` (1000 samples)
- Default test dataset: `tests/assets/datasets/dummy_1k.jsonl` (1000 samples)

**Advanced:**

Expand Down
2 changes: 1 addition & 1 deletion examples/02_ServerBenchmarking/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ enroot start -e HF_TOKEN=$HF_TOKEN -m $HF_HOME:/root/.cache/huggingface vllm+vll
Once the server is up and running, we can send requests to the endpoint by passing in the endpoint address and model name:

```
uv run inference-endpoint benchmark offline --endpoints http://localhost:8000 --dataset tests/datasets/dummy_1k.jsonl --model ${MODEL_NAME}
uv run inference-endpoint benchmark offline --endpoints http://localhost:8000 --dataset tests/assets/datasets/dummy_1k.jsonl --model ${MODEL_NAME}
```

# Using a config file
Expand Down
5 changes: 4 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,9 @@ dependencies = [
"sentencepiece==0.2.1",
"protobuf==7.34.1",
"openai_harmony==0.0.8",
# HDR Histogram for live percentile/histogram approximations in the
# metrics aggregator (PyPI: hdrhistogram, importable as hdrh.histogram).
"hdrhistogram==0.10.3",
# Color support for cross-platform terminals
"colorama==0.4.6",
# Fix pytz-2024 import warning
Expand Down Expand Up @@ -131,7 +134,7 @@ Issues = "https://github.com/mlperf/inference-endpoint/issues"
target-version = "py312"
line-length = 88
exclude = [
"tests/datasets/*",
"tests/assets/datasets/*",
"src/inference_endpoint/openai/openai_types_gen.py",
"src/inference_endpoint/openai/openapi.yaml",
"datasets/*",
Expand Down
4 changes: 2 additions & 2 deletions scripts/create_dummy_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ def create_dummy_dataset(num_samples: int = 1000, output_path: str = None):

Args:
num_samples: Number of samples to generate
output_path: Output file path (default: tests/datasets/dummy_1k.jsonl)
output_path: Output file path (default: tests/assets/datasets/dummy_1k.jsonl)
"""
# Create varied prompts
prompt_templates = [
Expand Down Expand Up @@ -122,7 +122,7 @@ def main():
"--output",
"-o",
type=str,
help="Output file path (default: tests/datasets/dummy_1k.jsonl)",
help="Output file path (default: tests/assets/datasets/dummy_1k.jsonl)",
)

args = parser.parse_args()
Expand Down
4 changes: 2 additions & 2 deletions scripts/regenerate_templates.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,14 +74,14 @@
PERF_DATASET = {
"name": "perf",
"type": "performance",
"path": "<DATASET_PATH eg: tests/datasets/dummy_1k.jsonl>",
"path": "<DATASET_PATH eg: tests/assets/datasets/dummy_1k.jsonl>",
"parser": {"prompt": "text_input"},
}

ACC_DATASET = {
"name": "accuracy",
"type": "accuracy",
"path": "<DATASET_PATH eg: tests/datasets/ds_samples.jsonl>",
"path": "<DATASET_PATH eg: tests/assets/datasets/ds_samples.jsonl>",
"eval_method": "exact_match",
"parser": {"prompt": "question", "system": "system_prompt"},
"accuracy_config": {
Expand Down
Loading
Loading