You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
LeCun, Yann, John Denker, and Sara Solla. "Optimal brain damage." Advances in neural information processing systems 2 (1989).
Hinton, Geoffrey. "Distilling the Knowledge in a Neural Network." arXiv preprint arXiv:1503.02531 (2015).
Han, Song, Huizi Mao, and William J. Dally. "Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding." arXiv preprint arXiv:1510.00149 (2015).
Han, Song, et al. "EIE: Efficient inference engine on compressed deep neural network." ACM SIGARCH Computer Architecture News 44.3 (2016): 243-254.
Zoph, B. "Neural architecture search with reinforcement learning." arXiv preprint arXiv:1611.01578 (2016).
Frankle, Jonathan, and Michael Carbin. "The lottery ticket hypothesis: Finding sparse, trainable neural networks." arXiv preprint arXiv:1803.03635 (2018).
Hubara, Itay, et al. "Quantized neural networks: Training neural networks with low precision weights and activations." Journal of Machine Learning Research 18.187 (2018): 1-30.
Liu, Zhuang, et al. "Rethinking the value of network pruning." arXiv preprint arXiv:1810.05270 (2018).
Heo, Byeongho, et al. "A comprehensive overhaul of feature distillation." Proceedings of the IEEE/CVF international conference on computer vision. 2019.
Quantization
Courbariaux, Matthieu, Yoshua Bengio, and Jean-Pierre David. "Binaryconnect: Training deep neural networks with binary weights during propagations." Advances in neural information processing systems 28 (2015).
Choi, Yoojin, Mostafa El-Khamy, and Jungwon Lee. "Towards the limit of network quantization." arXiv preprint arXiv:1612.01543 (2016).
Lin, Darryl, Sachin Talathi, and Sreekanth Annapureddy. "Fixed point quantization of deep convolutional networks." International conference on machine learning. PMLR, 2016.
Zhou, Shuchang, et al. "Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients." arXiv preprint arXiv:1606.06160 (2016).
Cheng, Yu, et al. "Model compression and acceleration for deep neural networks: The principles, progress, and challenges." IEEE Signal Processing Magazine 35.1 (2018): 126-136
Krishnamoorthi, Raghuraman. "Quantizing deep convolutional networks for efficient inference: A whitepaper." arXiv preprint arXiv:1806.08342 (2018).
Hubara, Itay, et al. "Quantized neural networks: Training neural networks with low precision weights and activations." Journal of Machine Learning Research 18.187 (2018): 1-30.
Fan, Angela, et al. "Training with quantization noise for extreme model compression." arXiv preprint arXiv:2004.07320 (2020).
Deng, Lei, et al. "Model compression and hardware acceleration for neural networks: A comprehensive survey." Proceedings of the IEEE 108.4 (2020): 485-532.
Li, Yuhang, et al. "Brecq: Pushing the limit of post-training quantization by block reconstruction." arXiv preprint arXiv:2102.05426 (2021)
Gholami, Amir, et al. "A survey of quantization methods for efficient neural network inference." Low-Power Computer Vision. Chapman and Hall/CRC, 2022. 291-326.
Lee, Junghyup, et al. "Scheduling Weight Transitions for Quantization-Aware Training." arXiv preprint arXiv:2404.19248 (2024). (ICCV 2025)
Zero-Shot Quantization (Data-Free Quantization)
Nagel, Markus, et al. "Data-free quantization through weight equalization and bias correction." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019.
Choi, Yoojin, et al. "Data-free network quantization with adversarial knowledge distillation." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 2020.
Cai, Yaohui, et al. "Zeroq: A novel zero shot quantization framework." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020.
Xu, Shoukai, et al. "Generative low-bitwidth data free quantization." Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XII 16. Springer International Publishing, 2020.
Data Generation (Zero-Shot Quantization)
Xu, Shoukai, et al. "Generative low-bitwidth data free quantization." Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XII 16. Springer International Publishing, 2020.
Zhang, Xiangguo, et al. "Diversifying sample generation for accurate data-free quantization." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021.
Choi, Kanghyun, et al. "Qimera: Data-free quantization with synthetic boundary supporting samples." Advances in Neural Information Processing Systems 34 (2021): 14835-14847.
Zhong, Yunshan, et al. "Intraq: Learning synthetic images with intra-class heterogeneity for zero-shot network quantization." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.
Li, Huantong, et al. "Hard sample matters a lot in zero-shot quantization." Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition. 2023.
Qian, Biao, et al. "Rethinking data-free quantization as a zero-sum game." Proceedings of the AAAI conference on artificial intelligence. Vol. 37. No. 8. 2023.
Qian, Biao, et al. "Adaptive data-free quantization." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.
Chen, Xinrui, et al. "TexQ: zero-shot network quantization with texture feature distribution calibration." Advances in Neural Information Processing Systems 36 (2024).
Bai, Jianhong, et al. "Robustness-Guided Image Synthesis for Data-Free Quantization." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 38. No. 10. 2024.
Li, Yuhang, et al. "GenQ: Quantization in Low Data Regimes with Generative Synthetic Data." European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2024.
Ramachandran, Akshat, et al. "OuroMamba: A Data-Free Quantization Framework for Vision Mamba Models." arXiv preprint arXiv:2503.10959 (2025). (ICCV 2025)
Li, Changhao, et al. "Task-Specific Zero-shot Quantization-Aware Training for Object Detection." arXiv preprint arXiv:2507.16782 (2025). (ICCV 2025)
Model Training (Zero-Shot Quantization)
Guo, Cong, et al. "Squant: On-the-fly data-free quantization via diagonal hessian approximation." arXiv preprint arXiv:2202.07471 (2022).
Choi, Kanghyun, et al. "It's all in the teacher: Zero-shot quantization brought closer to the teacher." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.
Shang, Yuzhang, et al. "Enhancing Post-training Quantization Calibration through Contrastive Learning." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024.
Li, Yuhang, et al. "GenQ: Quantization in Low Data Regimes with Generative Synthetic Data." European Conference on Computer Vision. Springer, Cham, 2025.
Hong, Inpyo, et al. "Advanced Knowledge Transfer: Refined Feature Distillation for Zero-Shot Quantization in Edge Computing." arXiv preprint arXiv:2412.19125 (2024).
Kim, Minjun, et al. "SynQ: Accurate Zero-shot Quantization by Synthesis-aware Fine-tuning", The Thirteenth International Conference on Learning Representations (ICLR), 2025.
Zhong, Yunshan, et al. "Semantic Alignment and Reinforcement for Data-Free Quantization of Vision Transformers" arXiv preprint arXiv:2412.16553 (2024). (ICCV 2025)
LLM Quantization
Dettmers, Tim, et al. "Gpt3. int8 (): 8-bit matrix multiplication for transformers at scale." Advances in Neural Information Processing Systems 35 (2022): 30318-30332.
Yao, Zhewei, et al. "Zeroquant: Efficient and affordable post-training quantization for large-scale transformers." Advances in Neural Information Processing Systems 35 (2022): 27168-27183.
Wu, Xiaoxia, et al. "Understanding int4 quantization for language models: latency speedup, composability, and failure cases." International Conference on Machine Learning. PMLR, 2023.
Liu, Zirui, et al. "Kivi: A tuning-free asymmetric 2bit quantization for kv cache." arXiv preprint arXiv:2402.02750 (2024).
Zhang, Cheng, et al. "LQER: Low-Rank Quantization Error Reconstruction for LLMs." arXiv preprint arXiv:2402.02446 (2024).
Huang, Wei, et al. "Billm: Pushing the limit of post-training quantization for llms." arXiv preprint arXiv:2402.04291 (2024).
Guo, Jinyang, et al. "Compressing large language models by joint sparsification and quantization." Forty-first International Conference on Machine Learning. 2024.
Yao, Zhewei, et al. "Exploring post-training quantization in llms from comprehensive study to low rank compensation." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 38. No. 17. 2024.
Li, Liang, et al. "Norm tweaking: High-performance low-bit quantization of large language models." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 38. No. 17. 2024.
Heo, Jung Hwan, et al. "Rethinking channel dimensions to isolate outliers for low-bit weight quantization of large language models." arXiv preprint arXiv:2309.15531 (2023).
Liu, Jing, et al. "Qllm: Accurate and efficient low-bitwidth quantization for large language models." arXiv preprint arXiv:2310.08041 (2023).
Zhao, Weibo, et al. "ASER: Activation Smoothing and Error Reconstruction for Large Language Model Quantization." arXiv preprint arXiv:2411.07762 (2024).(AAAI 2025)
Generative Model Quantization
Stable Diffusion (UNet)
Shang, Yuzhang, et al. "Post-training quantization on diffusion models." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2023.
Li, Xiuyu, et al. "Q-diffusion: Quantizing diffusion models." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023.
Sui, Yang, et al. "Bitsfusion: 1.99 bits weight quantization of diffusion model." arXiv preprint arXiv:2406.04333 (2024). (NeurIPS 2024)
Tang, Siao, et al. "Post-training Quantization for Text-to-Image Diffusion Models with Progressive Calibration and Activation Relaxing." arXiv preprint arXiv:2311.06322 (2023). (ECCV 2024)
Li, Muyang, et al. "Svdqunat: Absorbing outliers by low-rank components for 4-bit diffusion models." arXiv preprint arXiv:2411.05007 (2024). (ICLR 2025)
Ryu, Hyogon, NaHyeon Park, and Hyunjung Shim. "Dgq: Distribution-aware group quantization for text-to-image diffusion models." arXiv preprint arXiv:2501.04304 (2025). (ICLR 2025)
Lee, Dongyeun, et al. "DMQ: Dissecting Outliers of Diffusion Models for Post-Training Quantization." arXiv preprint arXiv:2507.12933 (2025). (ICCV 2025)
Wang, Haoxuan, et al. "Quest: Low-bit diffusion model quantization via efficient selective finetuning." arXiv preprint arXiv:2402.03666 (2024). (ICCV 2025)
Memory-Efficient Generative Models via Product Quantization (ICCV 2025, TBD)
DiT (Diffusion Transformer)
Wu, Junyi, et al. "Ptq4dit: Post-training quantization for diffusion transformers." arXiv preprint arXiv:2405.16005 (2024). (NeurIPS 2024)
Deng, Juncan, et al. "Vq4dit: Efficient post-training vector quantization for diffusion transformers." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 39. No. 15. 2025.
Dong, Zhenyuan, and Sai Qian Zhang. "DiTAS: Quantizing Diffusion Transformers via Enhanced Activation Smoothing." arXiv preprint arXiv:2409.07756 (2024). (WACV 2025 Accepted)
Chen, Lei, et al. "Q-dit: Accurate post-training quantization for diffusion transformers." arXiv preprint arXiv:2406.17343 (2024). (CVPR 2025 Accepted)
Zhao, Tianchen, et al. "Vidit-q: Efficient and accurate quantization of diffusion transformers for image and video generation." arXiv preprint arXiv:2406.02540 (2024). (ICLR 2025 Accepted)
Vector Quantization
Kim, Youngeun, et al. "Task vector quantization for memory-efficient model merging." arXiv preprint arXiv:2503.06921 (2025). (ICCV 2025)
Li, Shuaiting, et al. "SSVQ: Unleashing the Potential of Vector Quantization with Sign-Splitting." arXiv preprint arXiv:2503.08668 (2025). (ICCV 2025)
Deng, Juncan, et al. "ViM-VQ: Efficient Post-Training Vector Quantization for Visual Mamba." arXiv preprint arXiv:2503.09509 (2025). (ICCV 2025)
About
Model Compression Paper List (Focusing on Quantization, Particularly Zero-Shot Quantization)