Skip to content

Add Qwen3-vl Interactive Reference Code and Dataset#309

Open
Victor49152 wants to merge 8 commits into
mlcommons:mainfrom
Victor49152:feat/qwen3_vl_interactive_ref
Open

Add Qwen3-vl Interactive Reference Code and Dataset#309
Victor49152 wants to merge 8 commits into
mlcommons:mainfrom
Victor49152:feat/qwen3_vl_interactive_ref

Conversation

@Victor49152
Copy link
Copy Markdown
Collaborator

@Victor49152 Victor49152 commented May 11, 2026

What does this PR do?

  • Add predefined dataset shopify_product_catalogue_8k for VLM interactive scenario
  • Add example config in Q3VL examples about interactive measurement

Type of change

  • Bug fix
  • New feature
  • Documentation update
  • Refactor/cleanup

Related issues

Testing

  • Tests added/updated
  • All tests pass locally
  • Manual testing completed

Checklist

  • Code follows project style
  • Pre-commit hooks pass
  • Documentation updated (if needed)

@Victor49152 Victor49152 requested a review from a team May 11, 2026 23:05
@Victor49152 Victor49152 marked this pull request as draft May 11, 2026 23:05
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 11, 2026

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@Victor49152 Victor49152 self-assigned this May 11, 2026
@Victor49152 Victor49152 added priority: P1 High — must address this cycle area: dataset Dataset manager, formats, predefined datasets labels May 11, 2026
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new 8k sample variant of the Shopify product catalogue dataset, including documentation, example configurations, and comprehensive unit tests. The implementation refactors the existing Shopify dataset into a base class to support multiple variants. Additionally, the PR adds a tokenizer_name override to ModelParams to handle local model paths, improves the robustness of OpenAI response parsing by providing default values for optional fields, and updates the load generator to support multimodal prompt formats. Feedback suggests using None instead of magic values like 0 or "" for default values in the OpenAI response schema to improve consistency and data representation.

Comment thread src/inference_endpoint/openai/types.py
@Victor49152 Victor49152 force-pushed the feat/qwen3_vl_interactive_ref branch from b885cd4 to ffa7b64 Compare May 13, 2026 16:28
Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>
Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>
Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>
Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>
@Victor49152 Victor49152 force-pushed the feat/qwen3_vl_interactive_ref branch from ffa7b64 to 11ec3d6 Compare May 13, 2026 16:38
Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>
Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>
@Victor49152 Victor49152 force-pushed the feat/qwen3_vl_interactive_ref branch from b162b1d to 2221e2f Compare May 13, 2026 16:50
Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>
@Victor49152 Victor49152 marked this pull request as ready for review May 13, 2026 17:00
@Victor49152 Victor49152 changed the title [Draft] Feat/qwen3 vl interactive ref Add Qwen3-vl Interactive Reference Code and Dataset May 13, 2026
Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>
Copy link
Copy Markdown
Collaborator

@wangshangsam wangshangsam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits, but LGTM overall!


model_params:
name: "Qwen/Qwen3-VL-235B-A22B-Instruct"
# tokenizer_name: "Qwen/Qwen3-VL-235B-A22B-Instruct" # Set this if model name is a local/container path
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# tokenizer_name: "Qwen/Qwen3-VL-235B-A22B-Instruct" # Set this if model name is a local/container path

This is no longer needed. Local checkpoint directory (with tokenizer files in the directory) can now run with the latest revision of @BolinSNLHM 's PR (already merged in).

Copy link
Copy Markdown
Collaborator Author

@Victor49152 Victor49152 May 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is missed. Should be deleted.

@@ -14,6 +14,13 @@ model_params:
datasets:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Officially, were we calling this scenario "online" or "server"? (I vaguely recall that it's called "server")

If so, I would suggest to rename this file as server_qwen3_... for the sake of consistency.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about this naming, too. Since in regular mlperf, it's called server. But I saw other endpoints examples calling it online. Let me change to server since we are submitting to regular MLPERF anyway

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: dataset Dataset manager, formats, predefined datasets priority: P1 High — must address this cycle

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants