Skip to content

D-Robotics-AI-Lab/SmolVLM_PTQ

Repository files navigation

English | 简体中文

SmolVLM-PTQ: D-Robotics PTQ Deployment

A Post-Training Quantization (PTQ) solution for efficiently deploying the SmolVLM vision-language model on RDK series development boards.

This project provides scripts and detailed guides covering model export, accuracy verification, data preparation, and final quantized deployment, helping developers quickly bring SmolVLM to embedded platforms.

📋 Table of Contents

  • 🛠️ Workflow Overview
  • 🚀 Quick Start
    • Step 1: Environment Setup & Installation
    • Step 2: Export ONNX Model
    • Step 3: ONNX Model Accuracy Verification (Optional)
    • Step 4: Generate PTQ Calibration Data
    • Step 5: Run PTQ Quantization
  • 🙏 Acknowledgements
  • 📜 License

🛠️ Workflow Overview

The core workflow of this project consists of five main steps:

  1. Environment Setup: Install a modified version of the transformers library adapted for ONNX export.
  2. Export ONNX: Convert the vision encoder of the pretrained SmolVLM model to ONNX format.
  3. Accuracy Verification: Compare the outputs of the original PyTorch model and the ONNX model to ensure no accuracy loss during conversion.
  4. Generate Calibration Data: Prepare a calibration dataset for the PTQ process.
  5. PTQ Quantization: Use the D-Robotics (Horizon) toolchain to quantize the ONNX model into a hardware-deployable format.

🚀 Quick Start

Follow the steps below to complete model quantization and deployment.

Step 1: Environment Setup & Installation

Our solution requires modifications to the transformers library internals to support ONNX export. Therefore, do not install the official version via pip directly. Instead, follow these steps to install from source.

  1. Clone the specified version of the official transformers repository:

    git clone https://github.com/huggingface/transformers.git -b v4.51.3
  2. Replace the model file:

    Copy the modeling_smolvlm.py file from this project and overwrite the corresponding file in the transformers source:

    transformers/src/transformers/models/smolvlm/modeling_smolvlm.py

  3. Install in editable mode:

    Navigate to the root of the transformers source directory and run:

    cd transformers
    python -m pip install -e .

⚠️ Important

Since we are using editable mode (-e), any changes you make to the transformers library code will take effect immediately. However, if you move the transformers folder to a different location or switch to a new environment, you must re-run the pip install -e . command to rebuild the index, otherwise the modifications will not be applied.

Step 2: Export ONNX Model

  1. Download the official SmolVLM weights:

    You can download the weights from the Hugging Face Hub:

    # Example model — choose the 500M or 256M variant based on your needs
    https://huggingface.co/HuggingFaceTB/SmolVLM2-500M-Video-Instruct
  2. Run the export script:

    Open export_smolvlm.py and modify model_path to point to your local weights directory. Then run:

    python export_smolvlm.py

    After successful execution, you will find the XXX.onnx model file in the current directory.

Step 3: ONNX Model Accuracy Verification (Optional)

This is a recommended step to ensure the ONNX model output is highly consistent with the original PyTorch model.

  1. (Optional) Create a clean Python environment, or uninstall our modified transformers library in the current environment.

  2. Install the official transformers library:

    python -m pip install transformers==4.51.3
  3. Run the comparison script:

    Edit onnx_diff.py and configure your local SmolVLM weights path and the exported ONNX file path. Then run:

    python onnx_diff.py

    The script computes and prints the difference between the original model and the ONNX model inference results. Typically, this value should be less than 1e-4.

Step 4: Generate PTQ Calibration Data

The PTQ (Post-Training Quantization) process requires a small amount of representative data to calibrate the quantization parameters.

  1. Prepare calibration images:

    Create a folder and place 50–100 representative images in it (e.g., randomly sampled from the COCO dataset).

  2. Run the generation script:

    Open generate_calibra_data.py and configure the following paths:

    • INPUT_IMAGE_DIR: Path to the folder containing your images.
    • OUTPUT_NPY_DIR: Output path for calibration files in .npy format.
    • OUTPUT_BIN_DIR: Output path for calibration files in .bin format.

    Then run:

    python generate_calibra_data.py

    The script will generate two subfolders:

    • calib_npy: Contains .npy format calibration data, recommended for S100.
    • calib_bin: Contains .bin format calibration data, recommended for RDK X5 (theoretically also compatible with S100).

Step 5: Run PTQ Quantization

Everything is ready. Now you can use the official D-Robotics toolchain for final quantization.

⚠️ Important Note

Although the 500M and 256M variants of SmolVLM2 use the same Siglip visual encoder specification, both models require activating Siglip parameters during training. Therefore, the vision encoder ONNX must be exported and PTQ-quantized separately for the 500M and 256M models. You cannot share a single 500M siglip.bin with the 256M model.

We had provided float onnx models for PTQ Quantization.

# 500M model
wget https://huggingface.co/D-Robotics/SmolVLM2-500M-Video-Instruct-GGUF-BPU/resolve/main/smolvlm2_500M_visual_resampler_opset11_optimized.onnx

# 256M model
wget https://huggingface.co/D-Robotics/SmolVLM2-256M-Video-Instruct-GGUF-BPU/resolve/main/smolvlm2_256M_visual_resampler_opset11_optimized.onnx
  • For RDK X5 platform:

    Use the hb_mapper tool with the siglip_smolvlm_x5.yaml configuration file.

    hb_mapper makertbin --config siglip_smolvlm_x5.yaml --model-type onnx

    Remember to edit siglip_smolvlm_x5.yaml and update the ONNX model path and calibration data path. Upon success, a .bin file for RDK X5 deployment will be generated.

  • For RDK S100 platform:

    Use the hb_compile tool with the siglip_smolvlm_S100.yaml configuration file.

    hb_compile -c siglip_smolvlm_S100.yaml

    Remember to edit siglip_smolvlm_S100.yaml and update the ONNX model path and calibration data path. Upon success, an .hbm file for RDK S100 deployment will be generated.

  • For S600 platform:

    Use the hb_compile tool with the siglip_smolvlm_S600.yaml configuration file.

    hb_compile -c siglip_smolvlm_S600.yaml

    Remember to edit siglip_smolvlm_S600.yaml and update the ONNX model path and calibration data path. Upon success, an .hbm file for RDK S600 deployment will be generated.

💡 Performance Tip

The full PTQ quantization process takes a long time (potentially around 3 hours). If you just want to quickly verify that the entire pipeline works, you can add the --fast-pref flag to the compile command (refer to the D-Robotics official documentation), which will significantly reduce compilation time.

🙏 Acknowledgements

  • Thanks to the Hugging Face team for open-sourcing the powerful Transformers library, and to the researchers who provided SmolVLM — such an excellent and compact vision-language model.
  • Thanks to D-Robotics and Horizon Robotics for providing efficient AI chips and the accompanying development toolchain.

📜 License

This project is licensed under the Apache License 2.0.

About

D-Robotics PTQ on [SmolVLM](https://github.com/huggingface/smollm)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages