Skip to content

fix: OpenAI response decoding defaults, multimodal prompt handling, and tokenizer_name override#307

Merged
Victor49152 merged 3 commits into
mlcommons:mainfrom
BolinSNLHM:fix/openai-decoding-and-tokenizer-override
May 12, 2026
Merged

fix: OpenAI response decoding defaults, multimodal prompt handling, and tokenizer_name override#307
Victor49152 merged 3 commits into
mlcommons:mainfrom
BolinSNLHM:fix/openai-decoding-and-tokenizer-override

Conversation

@BolinSNLHM
Copy link
Copy Markdown
Collaborator

Summary

Fixes three issues encountered when running the Qwen3-VL-235B-A22B benchmark against vLLM/Dynamo with the Shopify multimodal dataset:

  1. openai/types.py — vLLM omits optional response fields (content, refusal, finish_reason, usage, system_fingerprint, etc.) that our msgspec structs required to be present. Added safe defaults so decoding succeeds regardless of what the server populates.

  2. session.py — The ShopifyMultimodalFormatter produces prompt as list[dict] (OpenAI vision content parts), which the HTTP adapter handles correctly. However, PhaseIssuer's metrics tracking assigned it directly to PromptData.text (typed str | None), causing a decode failure. Added a type guard to coerce non-string prompts to None.

  3. config/schema.py + execute.py — When the model is served from a container-local path (e.g. an NVFP4 snapshot under /root/.cache/huggingface/hub/...), tokenizer loading fails because the path isn't a valid HF repo ID. Added model_params.tokenizer_name to decouple the tokenizer source from the serving model name.

Testing

Validated end-to-end on GB300 (offline scenario, 1000 samples): all samples completed successfully with no decode errors and full latency/throughput metrics generated.

Note: The example YAMLs assume the server is started with --served-model-name matching model_params.name. When serving from a local NVFP4 checkpoint (the typical NVIDIA submission setup), set model_params.name to the actual served path, tokenizer_name to the upstream HF repo ID (e.g. Qwen/Qwen3-VL-235B-A22B-Instruct), and endpoint_config.endpoints to the node IP.

@BolinSNLHM BolinSNLHM requested a review from a team May 5, 2026 21:30
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 5, 2026

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a tokenizer_name configuration option to allow overriding the tokenizer source, which is particularly useful when the serving model uses a local path or a quantized checkpoint. It also enhances the robustness of OpenAI response parsing by adding default values for fields frequently omitted by inference servers and updates the load generator to handle multimodal prompt data. A suggestion was made to improve the tokenizer existence check by verifying local file system paths before querying the Hugging Face Hub, ensuring local tokenizers are correctly identified without triggering API errors.

Comment thread src/inference_endpoint/commands/benchmark/execute.py Outdated
@BolinSNLHM BolinSNLHM force-pushed the fix/openai-decoding-and-tokenizer-override branch from 5d35e99 to a33690b Compare May 5, 2026 21:35
Comment thread src/inference_endpoint/commands/benchmark/execute.py Outdated
@wangshangsam wangshangsam added the type: bug Something isn't working label May 8, 2026
Copy link
Copy Markdown
Collaborator

@wangshangsam wangshangsam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The override works for the case where the local checkpoint genuinely doesn't ship tokenizer files, but I think it's papering over a probe bug rather than fixing the root cause.

_check_tokenizer_exists (execute.py:181) calls huggingface_hub.model_info(model_name) first, which only resolves repo IDs. Pass it a path like /root/.cache/huggingface/hub/... and it raises → except Exception → returns Falsetokenizer_name becomes None at execute.py:305execute.py:465 never passes --tokenizer to the metrics aggregator subprocess → ISL/OSL/TPOT silently disabled.

But the downstream loader at token_metrics.py:79 is AutoTokenizer.from_pretrained(...), which accepts a local directory containing tokenizer.json / tokenizer_config.json just fine. So there are really two cases:

  1. Local NVFP4 snapshot that contains tokenizer files — the typical case, since vLLM needs the tokenizer to serve, so it's almost always present in the snapshot. The probe is just being too strict; a short-circuit like "if Path(model_name).is_dir() and a tokenizer file exists, return True" would make everything work with no user-visible knob.
  2. Local path with no tokenizer files — rare (weights-only directories). Here tokenizer_name is the right escape hatch. -> But this is very rarely to happen.

This PR ships (2) but leaves (1) broken — even when the served directory contains a perfectly usable tokenizer.json, the user is still forced to set tokenizer_name. Could we enable _check_tokenizer_exists to support checking whether tokenizer files exist for a model checkpoint in local path?

@BolinSNLHM BolinSNLHM force-pushed the fix/openai-decoding-and-tokenizer-override branch from 26ab49b to b94e164 Compare May 12, 2026 15:27
@BolinSNLHM BolinSNLHM force-pushed the fix/openai-decoding-and-tokenizer-override branch from b94e164 to b783529 Compare May 12, 2026 15:44
Copy link
Copy Markdown
Collaborator

@wangshangsam wangshangsam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, but otherwise LGTM!

Comment thread src/inference_endpoint/load_generator/session.py Outdated
Co-authored-by: Shang Wang <shangw@nvidia.com>
@Victor49152 Victor49152 merged commit 6d82015 into mlcommons:main May 12, 2026
7 checks passed
@github-actions github-actions Bot locked and limited conversation to collaborators May 12, 2026
Copy link
Copy Markdown
Collaborator

@arekay-nv arekay-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

type: bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants