English | 简体中文
A Post-Training Quantization (PTQ) solution for efficiently deploying the SmolVLM vision-language model on RDK series development boards.
This project provides scripts and detailed guides covering model export, accuracy verification, data preparation, and final quantized deployment, helping developers quickly bring SmolVLM to embedded platforms.
- 🛠️ Workflow Overview
- 🚀 Quick Start
- Step 1: Environment Setup & Installation
- Step 2: Export ONNX Model
- Step 3: ONNX Model Accuracy Verification (Optional)
- Step 4: Generate PTQ Calibration Data
- Step 5: Run PTQ Quantization
- 🙏 Acknowledgements
- 📜 License
The core workflow of this project consists of five main steps:
- Environment Setup: Install a modified version of the
transformerslibrary adapted for ONNX export. - Export ONNX: Convert the vision encoder of the pretrained SmolVLM model to ONNX format.
- Accuracy Verification: Compare the outputs of the original PyTorch model and the ONNX model to ensure no accuracy loss during conversion.
- Generate Calibration Data: Prepare a calibration dataset for the PTQ process.
- PTQ Quantization: Use the D-Robotics (Horizon) toolchain to quantize the ONNX model into a hardware-deployable format.
Follow the steps below to complete model quantization and deployment.
Our solution requires modifications to the transformers library internals to support ONNX export. Therefore, do not install the official version via pip directly. Instead, follow these steps to install from source.
-
Clone the specified version of the official
transformersrepository:git clone https://github.com/huggingface/transformers.git -b v4.51.3
-
Replace the model file:
Copy the
modeling_smolvlm.pyfile from this project and overwrite the corresponding file in the transformers source:transformers/src/transformers/models/smolvlm/modeling_smolvlm.py -
Install in editable mode:
Navigate to the root of the
transformerssource directory and run:cd transformers python -m pip install -e .
⚠️ ImportantSince we are using editable mode (
-e), any changes you make to thetransformerslibrary code will take effect immediately. However, if you move thetransformersfolder to a different location or switch to a new environment, you must re-run thepip install -e .command to rebuild the index, otherwise the modifications will not be applied.
-
Download the official SmolVLM weights:
You can download the weights from the Hugging Face Hub:
# Example model — choose the 500M or 256M variant based on your needs https://huggingface.co/HuggingFaceTB/SmolVLM2-500M-Video-Instruct -
Run the export script:
Open
export_smolvlm.pyand modifymodel_pathto point to your local weights directory. Then run:python export_smolvlm.py
After successful execution, you will find the
XXX.onnxmodel file in the current directory.
This is a recommended step to ensure the ONNX model output is highly consistent with the original PyTorch model.
-
(Optional) Create a clean Python environment, or uninstall our modified
transformerslibrary in the current environment. -
Install the official
transformerslibrary:python -m pip install transformers==4.51.3
-
Run the comparison script:
Edit
onnx_diff.pyand configure your local SmolVLM weights path and the exported ONNX file path. Then run:python onnx_diff.py
The script computes and prints the difference between the original model and the ONNX model inference results. Typically, this value should be less than
1e-4.
The PTQ (Post-Training Quantization) process requires a small amount of representative data to calibrate the quantization parameters.
-
Prepare calibration images:
Create a folder and place 50–100 representative images in it (e.g., randomly sampled from the COCO dataset).
-
Run the generation script:
Open
generate_calibra_data.pyand configure the following paths:INPUT_IMAGE_DIR: Path to the folder containing your images.OUTPUT_NPY_DIR: Output path for calibration files in.npyformat.OUTPUT_BIN_DIR: Output path for calibration files in.binformat.
Then run:
python generate_calibra_data.py
The script will generate two subfolders:
calib_npy: Contains.npyformat calibration data, recommended for S100.calib_bin: Contains.binformat calibration data, recommended for RDK X5 (theoretically also compatible with S100).
Everything is ready. Now you can use the official D-Robotics toolchain for final quantization.
Although the 500M and 256M variants of SmolVLM2 use the same Siglip visual encoder specification, both models require activating Siglip parameters during training. Therefore, the vision encoder ONNX must be exported and PTQ-quantized separately for the 500M and 256M models. You cannot share a single 500M siglip.bin with the 256M model.
We had provided float onnx models for PTQ Quantization.
# 500M model
wget https://huggingface.co/D-Robotics/SmolVLM2-500M-Video-Instruct-GGUF-BPU/resolve/main/smolvlm2_500M_visual_resampler_opset11_optimized.onnx
# 256M model
wget https://huggingface.co/D-Robotics/SmolVLM2-256M-Video-Instruct-GGUF-BPU/resolve/main/smolvlm2_256M_visual_resampler_opset11_optimized.onnx-
For RDK X5 platform:
Use the
hb_mappertool with thesiglip_smolvlm_x5.yamlconfiguration file.hb_mapper makertbin --config siglip_smolvlm_x5.yaml --model-type onnx
Remember to edit
siglip_smolvlm_x5.yamland update the ONNX model path and calibration data path. Upon success, a.binfile for RDK X5 deployment will be generated. -
For RDK S100 platform:
Use the
hb_compiletool with thesiglip_smolvlm_S100.yamlconfiguration file.hb_compile -c siglip_smolvlm_S100.yaml
Remember to edit
siglip_smolvlm_S100.yamland update the ONNX model path and calibration data path. Upon success, an.hbmfile for RDK S100 deployment will be generated. -
For S600 platform:
Use the
hb_compiletool with thesiglip_smolvlm_S600.yamlconfiguration file.hb_compile -c siglip_smolvlm_S600.yaml
Remember to edit
siglip_smolvlm_S600.yamland update the ONNX model path and calibration data path. Upon success, an.hbmfile for RDK S600 deployment will be generated.
💡 Performance Tip
The full PTQ quantization process takes a long time (potentially around 3 hours). If you just want to quickly verify that the entire pipeline works, you can add the
--fast-prefflag to the compile command (refer to the D-Robotics official documentation), which will significantly reduce compilation time.
- Thanks to the Hugging Face team for open-sourcing the powerful Transformers library, and to the researchers who provided SmolVLM — such an excellent and compact vision-language model.
- Thanks to D-Robotics and Horizon Robotics for providing efficient AI chips and the accompanying development toolchain.
This project is licensed under the Apache License 2.0.