A structured learning reference: image fundamentals → classical methods → deep learning → generative artificial intelligence in CV.
There are dozens of "awesome computer vision" repositories on GitHub. Most are encyclopedic with thousands of links arranged by topic, with no guidance on where to start, what order to read things in, or why one resource matters more than another. They are useful as archives. They are less useful as learning tools.
This list is built around a different idea: curation over comprehensiveness.
Every entry is here because it genuinely helps someone understand computer vision more deeply — not simply because it exists. Resources are organised to reflect how the field is actually learned: from image fundamentals and classical methods, through deep learning, to the transformer-era models that define current research.
| This list | Most other CV lists | |
|---|---|---|
| Paper context | ✅ Why each paper matters, in sequence | ❌ Flat citation lists |
| Evaluation metrics | ✅ Full breakdown per task | ❌ Rarely covered |
| Actively maintained | ✅ Updated with recent work | |
| Conference & journal tiers | ✅ CORE-ranked, explained | ❌ Usually just a list |
| Multi-language libraries | ✅ Python, Rust, MATLAB | ❌ Python only |
- Students starting a CV module or thesis who want a clear first step
- Engineers moving into CV who need to fill gaps systematically
- Researchers wanting a compact reference for venues, metrics, and landmark papers
- Educators looking for a syllabus scaffold they can point students to
💡 New to the field? Start at Courses or Reference Books.
🔬 Already in research? Jump to Popular Articles or Repos.
Status: ✅ active (updated within 2 years) ·
⚠️ legacy (unmaintained but historically useful) · 🗄️ archived (officially abandoned)
- OpenCV: Open Source Computer Vision Library · ✅ active
- Pillow: The friendly PIL fork (Python Imaging Library) · ✅ active
- scikit-image: Collection of algorithms for image processing · ✅ active
- SciPy: Open-source software for mathematics, science, and engineering · ✅ active
- mmcv: OpenMMLab foundational library for computer vision research · ✅ active
- imutils: Convenience functions for basic image processing operations · ✅ active
- kornia: Open source differentiable computer vision library for PyTorch · ✅ active
- pgmagick: Python wrapper for GraphicsMagick/ImageMagick ·
⚠️ legacy - Mahotas: Fast computer vision algorithms in Python ·
⚠️ legacy - SimpleCV: Open Source Framework for Machine Vision · 🗄️ archived
Status: ✅ active (updated within 2 years) ·
⚠️ legacy (unmaintained but historically useful) · 🗄️ archived (officially abandoned)
- OpenCV-Rust: Rust bindings for OpenCV 3.4, 4.x, and 5.x · ✅ active
- Image: Encoding and decoding images in Rust · ✅ active
- ImageProc: Image processing operations built on the image crate · ✅ active
- Photon: Rust/WebAssembly image processing library ·
⚠️ legacy
Status: ✅ active (updated within 2 years) ·
⚠️ legacy (unmaintained but historically useful) · 🗄️ archived (officially abandoned)
- MLV: Mid-level Vision Toolbox, BWLab, University of Toronto · ✅ active
- PMT: Piotr's Computer Vision MATLAB Toolbox, P. Dollar ·
⚠️ legacy - matlabfns: MATLAB and Octave functions for computer vision and image processing, P. Kovesi, University of Western Australia ·
⚠️ legacy - VLFeat: Open source library of popular CV algorithms (SIFT, VLAD, Fisher Vectors, SLIC), A. Vedaldi and B. Fulkerson ·
⚠️ legacy - ElencoCode: Loris Nanni's CV functions, University of Padova ·
⚠️ legacy
- Antonio Torralba, Phillip Isola, William T. Freeman. "Foundations of Computer Vision" MIT Press, (2024). · [Goodreads]
- Nixon, Mark, and Alberto Aguado. "Feature extraction and image processing for computer vision" Academic press, (2019). · [Goodreads]
- González, Rafael Corsino and Richard E. Woods. "Digital image processing, 4th Edition" (2018). · [Goodreads]
- E.R. Davies. "Computer Vision: Principles, Algorithms, Applications, Learning" Academic press, (2017). · [Goodreads]
- Prince, Simon. "Computer Vision: Models, Learning, and Inference" (2012). · [Goodreads]
- Forsyth, David Alexander and Jean Ponce. "Computer Vision - A Modern Approach, Second Edition" (2011). · [Goodreads]
- Szeliski, Richard. "Computer Vision - Algorithms and Applications" Texts in Computer Science (2010). · [Goodreads]
- Bishop, Charles M.. "Pattern recognition and machine learning, 5th Edition" Information science and statistics (2007). · [Goodreads]
- Harltey, Andrew and Andrew Zisserman. "Multiple view geometry in computer vision (2. ed.)" (2003). · [Goodreads]
- Stockman, George C. and Linda G. Shapiro. "Computer Vision" (2001). · [Goodreads]
- Introduction to Computer Vision · 2026 · James Tompkin · Brown
- Deep Learning for Computer Vision · 2025 · Fei-Fei Li · Stanford
- Advances in Computer Vision · 2023 · William T. Freeman · MIT
- OpenCV for Python Developers · 2023 · Patrick Crawford · LinkedIn Learning
- Computer Vision · 2021 · Andreas Geiger · University of Tübingen
- Computer Vision · 2021 · Yogesh S Rawat / Mubarak Shah · University of Central Florida
- Advanced Computer Vision · 2021 · Mubarak Shah · University of Central Florida
- Deep Learning for Computer Vision · 2020 · Justin Johnson · University of Michigan
- Advanced Deep Learning for Computer Vision · 2020 · Laura Leal-Taixé / Matthias Niessner · Technical University of Munich
- Introduction to Digital Image Processing · 2020 · Ahmadreza Baghaie · New York Institute of Technology
- Quantitative Imaging · 2019 · Kevin Mader · ETH Zurich
- Convolutional Neural Networks for Visual Recognition · 2017 · Fei-Fei Li · Stanford University
- Introduction to Digital Image Processing · 2015 · Rich Radke · Rensselaer Polytechnic Institute
- Machine Learning for Robotics and Computer Vision · 2014 · Rudolph Triebel · Technical University of Munich
- Multiple View Geometry · 2013 · Daniel Cremers · Technical University of Munich
- Variational Methods for Computer Vision · 2013 · Daniel Cremers · Technical University of Munich
- Computer Vision · 2012 · Mubarak Shah · University of Central Florida
- Image and video processing · Guillermo Sapiro · Duke University
- Introduction to Computer Vision · Aaron Bobick / Irfan Essa · Udacity
Ranks follow CORE Conference Ranking. Acceptance rates are approximate, based on recent editions. Note: in CV and ML, conference prestige often exceeds journal prestige, unlike in most other fields.
-
CORE Rank A*
- CVPR: Conference on Computer Vision and Pattern Recognition (IEEE) · ~22% acceptance · the highest-volume top-tier CV venue [dblp]
- ICCV: International Conference on Computer Vision (IEEE) · ~26% acceptance · held in odd years only [dblp]
- NeurIPS: Conference on Neural Information Processing Systems · ~26% acceptance · primary venue for ML theory and deep learning [dblp]
- ICML: International Conference on Machine Learning · ~28% acceptance · top ML venue with growing CV presence [dblp]
- ICLR: International Conference on Learning Representations · ~32% acceptance · open-review format; major venue for deep learning and VLMs [dblp]
- ECCV: European Conference on Computer Vision (Springer) · ~28% acceptance · held in even years only [dblp]
- AAAI: AAAI Conference on Artificial Intelligence · ~20% acceptance · broad AI scope with strong CV track [dblp]
- ACMMM: ACM International Conference on Multimedia (ACM) [dblp]
- ICRA: International Conference on Robotics and Automation (IEEE) [dblp]
-
CORE Rank A
- MICCAI: Conference on Medical Image Computing and Computer Assisted Intervention (Springer) · ~30% acceptance · premier venue for medical imaging [dblp]
- WACV: Winter Conference on Applications of Computer Vision (IEEE) · ~29% acceptance · practical and applied CV; growing rapidly [dblp]
- IROS: International Conference on Intelligent Robots and Systems (IEEE) · covers CV for robotics and perception [dblp]
- ISBI: IEEE International Symposium on Biomedical Imaging (IEEE) [dblp]
- BMVC: British Machine Vision Conference (BMVA) [dblp]
-
CORE Rank B
- ICPR: International Conference on Pattern Recognition (IEEE) [dblp]
- ACCV: Asian Conference on Computer Vision (Springer) [dblp]
- ICASSP: International Conference on Acoustics, Speech, and Signal Processing (IEEE) [dblp]
- ICIP: International Conference on Image Processing (IEEE) [dblp]
- VISAPP: International Conference on Vision Theory and Applications (SCITEPRESS) [dblp]
- ACIVS: Conference on Advanced Concepts for Intelligent Vision Systems (Springer) [dblp]
- EUSIPCO: European Signal Processing Conference (EURASIP/IEEE) [dblp]
-
CORE Rank C
- VCIP: International Conference on Visual Communications and Image Processing (IEEE) [dblp]
- CAIP: International Conference on Computer Analysis of Images and Patterns (Springer) [dblp]
- ICISP: International Conference on Image and Signal Processing (Springer) [dblp]
- ICIAR: International Conference on Image Analysis and Recognition (Springer) [dblp]
- ICVS: International Conference on Computer Vision Systems (Springer) [dblp]
-
Unranked but notable
- MIUA: Medical Image Understanding and Analysis (BMVA) · UK-focused medical imaging [dblp]
- EUVIP: European Workshop on Visual Information Processing (IEEE/EURASIP) [dblp]
- CIC: Color and Imaging Conference (IS&T) [dblp]
- CVCS: Colour and Visual Computing Symposium [dblp]
- DSP: International Conference on Digital Signal Processing (IEEE) [dblp]
Rankings use the SCImago Journal Rank (SJR) indicator. SJR is a size-independent prestige metric: it weights citations by the influence of the citing journal, not just their count. Quartiles (Q1 to Q4) place each journal within its subject category; Q1 is the top 25%. In computer vision and machine learning, top conferences (CVPR, ICCV, ECCV) often carry more prestige than journals; many researchers publish conference papers first and submit extended versions to journals later.
-
Core CV and ML Journals
- IEEE TPAMI: Transactions on Pattern Analysis and Machine Intelligence · Q1 · the highest-prestige journal in CV/ML; publishes foundational and survey work [dblp] [scimago]
- Elsevier MedIA: Medical Image Analysis · Q1 · leading venue in medical imaging [dblp] [scimago]
- IEEE TIP: Transactions on Image Processing · Q1 · image processing, analysis, and low-level vision [dblp] [scimago]
- IEEE TMI: Transactions on Medical Imaging · Q1 · premier journal for medical image analysis [dblp] [scimago]
- Elsevier PR: Pattern Recognition · Q1 · broad scope; high volume [dblp] [scimago]
- IJCV: International Journal of Computer Vision (Springer) · Q1 · primary venue for long-form CV research [dblp] [scimago]
- IEEE TCSVT: Transactions on Circuits and Systems for Video Technology · Q1 · video understanding, compression, and streaming [dblp] [scimago]
- IEEE TVCG: Transactions on Visualization and Computer Graphics · Q1 · covers rendering, visual analytics, and 3D vision [dblp] [scimago]
- Elsevier CVIU: Computer Vision and Image Understanding · Q1 [dblp] [scimago]
-
Robotics and Automation
-
Applied and Interdisciplinary
- Elsevier ESWA: Expert Systems with Applications · Q1 · broad applied scope; high volume [dblp] [scimago]
- Elsevier Neurocomputing · Q1 [dblp] [scimago]
- Springer NCA: Neural Computing and Applications · Q1 [dblp] [scimago]
- Elsevier CMIG: Computerized Medical Imaging and Graphics · Q1 [dblp] [scimago]
- Elsevier CMPB: Computer Methods and Programs in Biomedicine · Q1 [dblp] [scimago]
- Elsevier CBM: Computers in Biology and Medicine · Q1 [dblp] [scimago]
-
Specialist and Lower-Tier
- Elsevier PRL: Pattern Recognition Letters · Q1 · shorter-format work [dblp] [scimago]
- Elsevier IVC: Image and Vision Computing · Q1 [dblp] [scimago]
- Elsevier JVCIR: Journal of Visual Communication and Image Representation · Q2 [dblp] [scimago]
- Springer JMIV: Journal of Mathematical Imaging and Vision · Q2 · mathematical foundations of imaging [dblp] [scimago]
- SPIE JEI: Journal of Electronic Imaging · Q3 [dblp] [scimago]
- IET Image Processing · Q2 [dblp] [scimago]
- Springer PAA: Pattern Analysis and Applications · Q2 [dblp] [scimago]
- Springer MVA: Machine Vision and Applications · Q2 [dblp] [scimago]
- IET Computer Vision · Q2 [dblp] [scimago]
-
Open Access
- IEEE Access · Q1 · broad scope; fast publication; lower selectivity than the IEEE transactions [dblp] [scimago]
- MDPI Journal of Imaging · Q2 · fully open access; no subscription required [dblp] [scimago]
Summer schools are one of the best ways to get intensive, structured exposure to current CV research. Most run annually and accept applications from MSc students, PhD students, postdocs, and industry researchers.
Status: ✅ active (running regularly) · 🗄️ concluded (no longer running)
- ICVSS: International Computer Vision Summer School [2007-Present], Sicily, Italy · competitive application · winner of the IEEE PAMI Mark Everingham Prize (2017) · ✅ active
- BMVA CVSS: British Computer Vision Summer School [2013-Present], UK · Organized by BMVA · ✅ active
- VISUM: Machine Intelligence and Visual Computing Summer School [2013-2020], Porto, Portugal · 🗄️ concluded
-
Foundational Must-Reads
Ten papers every computer vision researcher should know. These defined the field's trajectory and are cited in virtually every modern CV paper.
- [Backprop, 1986] Rumelhart, David E., Geoffrey E. Hinton, and Ronald J. Williams. "Learning representations by back-propagating errors." Nature 323 (1986): 533-536. [paper]
- [LeNet-5, 1998] LeCun, Yann, et al. "Gradient-based learning applied to document recognition." Proceedings of the IEEE 86.11 (1998). [paper] — established CNNs as the standard for visual recognition
- [SIFT, 2004] Lowe, David G. "Distinctive image features from scale-invariant keypoints." IJCV 60.2 (2004): 91-110. [paper] — the dominant feature descriptor for a decade
- [BoVW, 2003/2004] Sivic, and Zisserman. "Video Google: A text retrieval approach to object matching in videos." Proceedings ninth IEEE international conference on computer vision. IEEE, 2003. Csurka, Gabriella, et al. "Visual categorization with bags of keypoints." Workshop on statistical learning in computer vision, ECCV. Vol. 1. No. 1-22. 2004. [paper] — introduced the bag-of-visual-words framework using visual vocabularies for image classification
- [HOG, 2005] Dalal, Navneet, and Bill Triggs. "Histograms of oriented gradients for human detection." CVPR (2005). [paper] — foundation of pedestrian and object detection
- [ImageNet, 2009] Deng, Jia, et al. "ImageNet: A large-scale hierarchical image database." CVPR (2009). [paper] — the benchmark that enabled the deep learning era
- [AlexNet, 2012] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "ImageNet classification with deep convolutional neural networks." NeurIPS (2012). [paper] — the paper that started the deep learning era in CV
- [GAN, 2014] Goodfellow, Ian, et al. "Generative adversarial nets." NeurIPS (2014). [paper] — introduced the GAN framework that underpins generative CV
- [U-Net, 2015] Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. "U-Net: Convolutional networks for biomedical image segmentation." MICCAI (2015). [paper] — the default architecture for segmentation tasks
- [ResNet, 2016] He, Kaiming, et al. "Deep residual learning for image recognition." CVPR (2016). [paper] — residual connections solved the vanishing gradient problem; still the most-used backbone
- [Attention, 2017] Vaswani, Ashish, et al. "Attention is all you need." NeurIPS (2017). [paper] — the transformer architecture that ViT and every modern foundation model is built on
- [ViT, 2020] Dosovitskiy, Alexey, et al. "An image is worth 16x16 words." ICLR (2021). [paper] — brought transformers to vision and reshaped every sub-field
-
Object Classification
- [LeNet-5, 1998] LeCun, Yann, et al. "Gradient-based learning applied to document recognition." Proceedings of the IEEE 86.11 (1998): 2278-2324.
- [AlexNet, 2012] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems 25 (2012).
- [ZFNet, 2014] Zeiler, Matthew D., and Rob Fergus. "Visualizing and understanding convolutional networks." Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I 13. Springer International Publishing, 2014.
- [VGG, 2014] Simonyan, Karen and Andrew Zisserman. “Very Deep Convolutional Networks for Large-Scale Image Recognition.” CoRR abs/1409.1556 (2014): n. Pag.
- [GoogLeNet, 2015] Szegedy, Christian, et al. "Going deeper with convolutions." Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.
- [ResNet, 2016] He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
- [InceptionV3, 2016] Szegedy, Christian, et al. "Rethinking the inception architecture for computer vision." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
- [Xception, 2017] Chollet, François. "Xception: Deep learning with depthwise separable convolutions." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
- [EfficientNet, 2019] Tan, Mingxing, and Quoc Le. "Efficientnet: Rethinking model scaling for convolutional neural networks." International conference on machine learning. PMLR, 2019.
- [ViT, 2020] Dosovitskiy, Alexey, et al. "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale." International Conference on Learning Representations. 2020.
- [ConvNeXt, 2022] Liu, Zhuang et al. “A ConvNet for the 2020s.” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022): 11966-11976.
-
Object Classification - Lightweight
- [SqueezeNet, 2016] Iandola, Forrest N., et al. "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size." arXiv preprint arXiv:1602.07360 (2016).
- [MobileNetV2, 2018] Sandler, Mark, et al. "Mobilenetv2: Inverted residuals and linear bottlenecks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
- [ShuffleNetV2, 2018] Ma, Ningning, et al. "Shufflenet v2: Practical guidelines for efficient cnn architecture design." Proceedings of the European conference on computer vision (ECCV). 2018.
- [MobileNetV3, 2019] Howard, Andrew, et al. "Searching for mobilenetv3." Proceedings of the IEEE/CVF international conference on computer vision. 2019.
- [GhostNetV1, 2020] Han, Kai, et al. "Ghostnet: More features from cheap operations." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020.
- [MobileViT, 2021] Mehta, Sachin, and Mohammad Rastegari. "Mobilevit: light-weight, general-purpose, and mobile-friendly vision transformer." arXiv preprint arXiv:2110.02178 (2021).
- [GhostNetV2, 2022] Tang, Yehui, et al. "GhostNetv2: enhance cheap operation with long-range attention." Advances in Neural Information Processing Systems 35 (2022): 9969-9982.
- [ConvNeXt-Tiny, 2022] Liu, Zhuang et al. “A ConvNet for the 2020s.” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022): 11966-11976.
- [MaxViT-Tiny, 2022] Tu, Zhengzhong, et al. "Maxvit: Multi-axis vision transformer." European conference on computer vision. Cham: Springer Nature Switzerland, 2022.
- [MobileFormer, 2022] Chen, Yinpeng, et al. "Mobile-former: Bridging mobilenet and transformer." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.
- [ConvNeXtV2-Tiny, 2023] Woo, Sanghyun, et al. "Convnext v2: Co-designing and scaling convnets with masked autoencoders." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.
- [Mobileone, 2023] Vasu, Pavan Kumar Anasosalu, et al. "Mobileone: An improved one millisecond mobile backbone." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2023.
- [TinyViM, 2025] Ma, Xiaowen, Zhenliang Ni, and Xinghao Chen. "Tinyvim: Frequency decoupling for tiny hybrid vision mamba." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2025.
- [SeaFormer++, 2025] Wan, Qiang, et al. "SeaFormer++: Squeeze-enhanced axial transformer for mobile visual recognition." International Journal of Computer Vision 133.6 (2025): 3645-3666.
-
Object Detection
- [Faster R-CNN, 2015] Ren, Shaoqing, et al. "Faster r-cnn: Towards real-time object detection with region proposal networks." Advances in neural information processing systems 28 (2015).
- [SSD, 2016] Liu, Wei, et al. "Ssd: Single shot multibox detector." Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14. Springer International Publishing, 2016.
- [RetinaNet, 2017] Lin, Tsung-Yi, et al. "Focal loss for dense object detection." Proceedings of the IEEE international conference on computer vision. 2017.
- [YOLOV3, 2018] Redmon, Joseph, and Ali Farhadi. "Yolov3: An incremental improvement." arXiv preprint arXiv:1804.02767 (2018).
- [YOLOX, 2021] Ge, Zheng, et al. "Yolox: Exceeding yolo series in 2021." arXiv preprint arXiv:2107.08430 (2021).
- [YOLOR, 2021] Wang, Chien-Yao, I-Hau Yeh, and Hong-Yuan Mark Liao. "You only learn one representation: Unified network for multiple tasks." arXiv preprint arXiv:2105.04206 (2021).
- [YOLOV7, 2023] Wang, Chien-Yao, Alexey Bochkovskiy, and Hong-Yuan Mark Liao. "YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.
-
Object Segmentation - Semantic / Instance / Panoptic
- Classical: Graph Cut / Normalized Cut, Fuzzy Clustering, Mean-shift / Quick-shift, SLIC, Active Contours (Snakes), Region Growing, K-means Clustering, Watershed, Level Set Methods, Markov Random Fields (MRF), Edge (1st / 2nd derivatives) + filling.
- [U-Net, 2015] Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. "U-net: Convolutional networks for biomedical image segmentation." Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. Springer International Publishing, 2015.
- [DeepLabV3, 2017] Chen, Liang-Chieh, et al. "Rethinking atrous convolution for semantic image segmentation." arXiv preprint arXiv:1706.05587 (2017).
- [PSPNet, 2017] Zhao, Hengshuang, et al. "Pyramid scene parsing network." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
- [Mask R-CNN, 2017] He, Kaiming, et al. "Mask r-cnn." Proceedings of the IEEE international conference on computer vision. 2017.
- [U-Net++, 2018] Zhou, Zongwei et al. “UNet++: A Nested U-Net Architecture for Medical Image Segmentation.” Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support : 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, held in conjunction with MICCAI 2018, Granada, Spain, S... 11045 (2018): 3-11.
- [DeepLabV3+, 2018] Chen, Liang-Chieh et al. “Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation.” European Conference on Computer Vision (2018).
- [MaskFormer, 2021] Cheng, Bowen, Alex Schwing, and Alexander Kirillov. "Per-pixel classification is not all you need for semantic segmentation." Advances in Neural Information Processing Systems 34 (2021): 17864-17875.
- [SegFormer, 2021] E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo, “Segformer: Simple and efficient design for semantic segmentation with transformers,” Advances in neural information processing systems, vol. 34, pp. 12 077–12 090, 2021.
- [SAM, 2023] A. Kirillov, E. Mintun, N. Ravi, et al., “Segment anything,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 4015–4026.
- [SEEM, 2023] Zou, Xueyan, et al. "Segment everything everywhere all at once." Advances in neural information processing systems 36 (2023): 19769-19782.
-
Feature Matching
- {Local Features} [Superpoint, 2018] DeTone, Daniel, Tomasz Malisiewicz, and Andrew Rabinovich. "Superpoint: Self-supervised interest point detection and description." Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 2018.
- {Local Features} [D2-Net, 2019] Dusmanu, Mihai, et al. "D2-net: A trainable cnn for joint detection and description of local features." arXiv preprint arXiv:1905.03561 (2019).
- [R2D2, 2019] Revaud, Jerome, et al. "R2D2: repeatable and reliable detector and descriptor." arXiv preprint arXiv:1906.06195 (2019).
- {Detector-Based Matcher} [SuperGlue, 2020] Sarlin, Paul-Edouard, et al. "Superglue: Learning feature matching with graph neural networks." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020.
- {Detector-Free Matcher} [DRC-Net, 2020] Li, Xinghui, et al. "Dual-resolution correspondence networks." Advances in Neural Information Processing Systems 33 (2020): 17346-17357.
- {Local Features} [DISK, 2020] Tyszkiewicz, Michał, Pascal Fua, and Eduard Trulls. "DISK: Learning local features with policy gradient." Advances in Neural Information Processing Systems 33 (2020): 14254-14265.
- {Detector-Free Matcher} [LoFTR, 2021] Sun, Jiaming, et al. "LoFTR: Detector-free local feature matching with transformers." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021.
- {Detector-Free Matcher} [MatchFormer, 2022] Wang, Qing, et al. "Matchformer: Interleaving attention in transformers for feature matching." Proceedings of the Asian Conference on Computer Vision. 2022.
- {Detector-Based Matcher} [LightGlue, 2023] Lindenberger, Philipp, Paul-Edouard Sarlin, and Marc Pollefeys. "LightGlue: Local Feature Matching at Light Speed." arXiv preprint arXiv:2306.13643 (2023).
- {Detector-Based Matcher} [GlueStick, 2023] Pautrat, Rémi, et al. "Gluestick: Robust image matching by sticking points and lines together." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023.
- {Detector-Free Matcher} [OAMatcher, 2023] Dai, Kun, et al. "OAMatcher: An Overlapping Areas-based Network for Accurate Local Feature Matching." arXiv preprint arXiv:2302.05846 (2023).
- Edstedt, Johan, et al. "RoMa: Revisiting Robust Losses for Dense Feature Matching." arXiv preprint arXiv:2305.15404 (2023).
- Shen, Xuelun, et al. "GIM: Learning Generalizable Image Matcher From Internet Videos." The Twelfth International Conference on Learning Representations. 2023.
- {Detector-Free Matcher} [DeepMatcher, 2024] Xie, Tao, et al. "Deepmatcher: a deep transformer-based network for robust and accurate local feature matching." Expert Systems with Applications 237 (2024): 121361.
- {Detector-Free Matcher} [XFeat, 2024] Potje, Guilherme, et al. "XFeat: Accelerated Features for Lightweight Image Matching." IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2024.
-
Object Tracking
- [SORT, 2017] Wojke, Nicolai, Alex Bewley, and Dietrich Paulus. "Simple online and realtime tracking with a deep association metric." 2017 IEEE international conference on image processing (ICIP). IEEE, 2017.
- [Tracktor, 2019] Bergmann, Philipp, Tim Meinhardt, and Laura Leal-Taixe. "Tracking without bells and whistles." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019.
- [FairMOT, 2021] Zhang, Yifu, et al. "Fairmot: On the fairness of detection and re-identification in multiple object tracking." International Journal of Computer Vision 129 (2021): 3069-3087.
- [STARK, 2021] Yan, Bin, et al. "Learning spatio-temporal transformer for visual tracking." Proceedings of the IEEE/CVF international conference on computer vision. 2021.
- [MixFormer, 2022] Cui, Yutao, et al. "Mixformer: End-to-end tracking with iterative mixed attention." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.
- [ByteTrack, 2022] Zhang, Yifu, et al. "Bytetrack: Multi-object tracking by associating every detection box." European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022.
-
Pose Estimation
- Classical: Active Shape Models (ASM), Active Appearance Models (AAM), Pictorial Structures, Deformable Part Models (DPM).
- [DeepPose, 2014] Toshev, Alexander, and Christian Szegedy. "DeepPose: Human pose estimation via deep neural networks." CVPR (2014).
- [Stacked Hourglass, 2016] Newell, Alejandro, Kaiyu Yang, and Jia Deng. "Stacked hourglass networks for human pose estimation." ECCV (2016).
- [OpenPose, 2019] Cao, Zhe, et al. "OpenPose: Realtime multi-person 2D pose estimation using part affinity fields." IEEE TPAMI (2019).
- [HRNet, 2019] Wang, Jingdong, et al. "Deep high-resolution representation learning for visual recognition." IEEE TPAMI (2019).
- [ViTPose, 2022] Xu, Yufei, et al. "ViTPose: Simple vision transformer baselines for human pose estimation." NeurIPS (2022).
- [DWPose, 2023] Yang, Tianhao, et al. "Effective whole-body pose estimation with two-stages distillation." ICCV Workshop (2023).
- [RTMPose, 2023] Jiang, Tao, et al. "RTMPose: Real-time multi-person pose estimation based on RTMDet." arXiv (2023).
- [UniPose, 2024] Yang, Junjie, et al. "UniPose: Detecting any keypoints." CVPR (2024).
-
Depth Estimation
- Classical: stereo matching, structured light, time-of-flight (ToF), SfM (Structure from Motion).
- [Make3D, 2009] Saxena, Ashutosh, Min Sun, and Andrew Y. Ng. "Make3D: Learning 3D scene structure from a single still image." IEEE TPAMI (2009).
- [Eigen et al., 2014] Eigen, David, Christian Puhrsch, and Rob Fergus. "Depth map prediction from a single image using a multi-scale deep network." NeurIPS (2014).
- [DenseDepth, 2018] Alhashim, Ibraheem, and Peter Wonka. "High quality monocular depth estimation via transfer learning." arXiv (2018).
- [MiDaS, 2020] Ranftl, René, et al. "Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer." IEEE TPAMI (2020).
- [AdaBins, 2021] Bhat, Shariq Farooq, et al. "AdaBins: Depth estimation using adaptive bins." CVPR (2021).
- [DPT, 2021] Ranftl, René, et al. "Vision transformers for dense prediction." ICCV (2021).
- [ZoeDepth, 2023] Bhat, Shariq Farooq, et al. "ZoeDepth: Zero-shot transfer by combining relative and metric depth." arXiv (2023).
- [Depth Anything, 2024] Yang, Lihe, et al. "Depth anything: Unleashing the power of large-scale unlabeled data." CVPR (2024).
- [Depth Anything V2, 2024] Yang, Lihe, et al. "Depth Anything V2." NeurIPS (2024).
- [Marigold, 2024] Ke, Bingxin, et al. "Repurposing diffusion-based image generators for monocular depth estimation." CVPR (2024).
-
Media Generation
- [DCGAN, 2015] Radford, Alec, Luke Metz, and Soumith Chintala. "Unsupervised representation learning with deep convolutional generative adversarial networks." arXiv preprint arXiv:1511.06434 (2015).
- [BigGAN, 2018] Brock, Andrew, Jeff Donahue, and Karen Simonyan. "Large scale GAN training for high fidelity natural image synthesis." arXiv preprint arXiv:1809.11096 (2018).
- [StyleGANv3, 2021] Karras, Tero, et al. "Alias-free generative adversarial networks." Advances in Neural Information Processing Systems 34 (2021): 852-863.
- [DALL-E, 2021] Ramesh, Aditya, et al. "Zero-shot text-to-image generation." International conference on machine learning. Pmlr, 2021.
- [LAFITE, 2021] Zhou, Y., et al. "Lafite: Towards language-free training for text-to-image generation. arxiv 2021." arXiv preprint arXiv:2111.13792 2 (2021).
- [CLIP, 2021] Radford, Alec, et al. "Learning transferable visual models from natural language supervision." International conference on machine learning. PMLR, 2021.
- [Imagen, 2022] Saharia, Chitwan, et al. "Photorealistic text-to-image diffusion models with deep language understanding." Advances in neural information processing systems 35 (2022): 36479-36494.
- [GLIDE, 2022] Nichol, Alexander Quinn, et al. "GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models." International Conference on Machine Learning. PMLR, 2022.
- [unCLIP, 2022] Ramesh, Aditya, et al. "Hierarchical Text-Conditional Image Generation with CLIP Latents." arXiv preprint arXiv:2204.06125 (2022).
- [LDM / Stable Diffusion (SD), 2022] Rombach, Robin, et al. "High-resolution image synthesis with latent diffusion models." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022.
- [DALL-E 2, 2022] Ramesh, Aditya, et al. "Hierarchical text-conditional image generation with clip latents." arXiv preprint arXiv:2204.06125 1.2 (2022).
- [DALL-E 3, 2023] Betker, James, et al. "Improving image generation with better captions." Computer Science. https://cdn.openai.com/papers/dall-e-3.pdf 2.3 (2023): 8.
- [SDXL, 2023] Podell, Dustin, et al. "SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis." The Twelfth International Conference on Learning Representations. 2023.
-
Vision-Language Models (VLMs)
- [CLIP, 2021] Radford, Alec, et al. "Learning transferable visual models from natural language supervision." ICML (2021).
- [ALIGN, 2021] Jia, Chao, et al. "Scaling up visual and vision-language representation learning with noisy text supervision." ICML (2021).
- [BLIP, 2022] Li, Junnan, et al. "BLIP: Bootstrapping language-image pre-training for unified vision-language understanding and generation." ICML (2022).
- [Flamingo, 2022] Alayrac, Jean-Baptiste, et al. "Flamingo: a visual language model for few-shot learning." NeurIPS (2022).
- [BLIP-2, 2023] Li, Junnan, et al. "BLIP-2: Bootstrapping language-image pre-training with frozen image encoders and large language models." ICML (2023).
- [LLaVA, 2023] Liu, Haotian, et al. "Visual instruction tuning." NeurIPS (2023).
- [InstructBLIP, 2023] Dai, Wenliang, et al. "InstructBLIP: Towards general-purpose vision-language models with instruction tuning." NeurIPS (2023).
- [GPT-4V, 2023] OpenAI. "GPT-4 technical report." arXiv (2023).
- [LLaVA-1.5, 2023] Liu, Haotian, et al. "Improved baselines with visual instruction tuning." CVPR (2024).
- [Qwen-VL, 2023] Bai, Jinze, et al. "Qwen-VL: A versatile vision-language model for understanding, localization, text reading, and beyond." arXiv (2023).
-
Image Retrieval
- [LSMH, 2016] Lu, Xiaoqiang, Xiangtao Zheng, and Xuelong Li. "Latent semantic minimal hashing for image retrieval." IEEE Transactions on Image Processing 26.1 (2016): 355-368.
- [R–GeM, 2018] Radenović, Filip, Giorgos Tolias, and Ondřej Chum. "Fine-tuning CNN image retrieval with no human annotation." IEEE transactions on pattern analysis and machine intelligence 41.7 (2018): 1655-1668.
- [HOW, 2020] Tolias, Giorgos, Tomas Jenicek, and Ondřej Chum. "Learning and aggregating deep local descriptors for instance-level recognition." Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16. Springer International Publishing, 2020.
- [DELG, 2020] Cao, Bingyi, Andre Araujo, and Jack Sim. "Unifying deep local and global features for image search." Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XX 16. Springer International Publishing, 2020.
- [SOLAR, 2020] Ng, Tony, et al. "SOLAR: second-order loss and attention for image retrieval." Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXV 16. Springer International Publishing, 2020.
- [FIRe, 2021] Weinzaepfel, Philippe, et al. "Learning Super-Features for Image Retrieval." International Conference on Learning Representations. 2021.
- [DOLG, 2021] Yang, Min, et al. "Dolg: Single-stage image retrieval with deep orthogonal fusion of local and global features." Proceedings of the IEEE/CVF International conference on Computer Vision. 2021.
- [Token, 2022] Wu, Hui, et al. "Learning token-based representation for image retrieval." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 36. No. 3. 2022.
- [CVNet, 2022] Lee, Seongwon, et al. "Correlation verification for image retrieval." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022.
- [GLAM, 2022] Song, Chull Hwan, Hye Joo Han, and Yannis Avrithis. "All the attention you need: Global-local, spatial-channel attention for image retrieval." Proceedings of the IEEE/CVF winter conference on applications of computer vision. 2022.
- [SuperGlobal, 2023] Shao, Shihao, et al. "Global features are all you need for image retrieval and reranking." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023.
- [CFCD, 2023] Zhu, Yunquan, et al. "Coarse-to-fine: Learning compact discriminative representation for single-stage image retrieval." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023.
- [SENet, 2023] Lee, Seongwon, et al. "Revisiting self-similarity: Structural embedding for image retrieval." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.
- [CiDeR, 2024] Song, Chull Hwan, et al. "On train-test class overlap and detection for image retrieval." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024.
-
WIP:
- Image Super-Resolution / Image Restoration
- Saliency Detection
- Vanishing Point Detection
- Image Colorization
- Image Captioning
- Video Summarization and Captioning
- Explainable AI (XAI)
- Text Recognition
- Data Compression
- Affective Computing
- Virtual reality (VR)
- Augmented reality (AR)
- Visual Question Answering (VQA)
- DeepFake Detection
- 3D Reconstruction
- Biometric Analysis
- Meta Learning
- Semi-Supervised Learning - Zero/One/Few shot
-
Performance - Classification
- Confusion Matrix: TP, FP, TN, and FN for each class
- For class-balanced datasets:
- Accuracy: (TP+TN) / (TP+FP+TN+FN)
- ROC curve: TPR vs FPR · summarised by AUROC (higher is better)
- For class-imbalanced datasets:
- Precision (P): TP / (TP+FP)
- Recall (R): TP / (TP+FN)
- F1-Score: 2·P·R / (P+R)
- Balanced Accuracy: (TPR+TNR) / 2
- Weighted-Averaged Precision, Recall, and F1-Score
- PR curve: Precision vs Recall · summarised by AUPRC (higher is better, more informative than AUROC on imbalanced data)
- For multi-label classification:
- Macro / Micro / Weighted averaging of above metrics
- Hamming Loss: fraction of labels incorrectly predicted
-
Performance - Detection
- Intersection over Union (IoU): area of overlap / area of union between predicted and ground-truth box
- Average Precision (AP): area under the Precision-Recall curve for a single class
- mAP: mean AP averaged over all classes
- mAP@0.5: IoU threshold of 0.5 (PASCAL VOC standard)
- mAP@0.5:0.95: mean over IoU thresholds 0.5 to 0.95 in steps of 0.05 (COCO standard, harder and preferred)
- AR@k: Average Recall at k proposals per image
- False Positives Per Image (FPPI): used in pedestrian detection benchmarks (e.g. Caltech)
- Log-Average Miss Rate (LAMR): standard metric for pedestrian detection, computed on FPPI vs Miss Rate curve
-
Performance - Segmentation
- Intersection over Union (IoU) / Jaccard Index: TP / (TP+FP+FN) per class
- mean IoU (mIoU): IoU averaged over all classes · primary metric for semantic segmentation benchmarks (Cityscapes, ADE20K)
- Dice Coefficient / F1-Score: 2·TP / (2·TP+FP+FN) · standard for medical image segmentation
- Mean Pixel Accuracy (mPA): fraction of pixels correctly classified per class, then averaged
- Panoptic Quality (PQ): PQ = SQ · RQ · unified metric for panoptic segmentation (COCO Panoptic)
- Boundary IoU (BIoU): IoU computed only near object boundaries · penalises coarse masks
- Hausdorff Distance (HD): maximum surface distance between predicted and ground-truth masks · common in medical imaging
- HD95: 95th-percentile Hausdorff Distance · more robust to outliers than HD
-
Performance - Tracking
- Multiple Object Tracking Accuracy (MOTA): combines false positives, false negatives, and identity switches
- Multiple Object Tracking Precision (MOTP): average localisation precision of matched detections
- ID F1-Score (IDF1): ratio of correctly identified detections over average of ground-truth and computed detections · better reflects long-term identity consistency than MOTA
- HOTA (Higher Order Tracking Accuracy): geometric mean of detection and association accuracy · increasingly preferred over MOTA/MOTP as a single summary metric
- Identity Switches (IDSW): number of times a tracked object changes its assigned ID
- Mostly Tracked (MT) / Mostly Lost (ML): fraction of ground-truth trajectories tracked for more than 80% / less than 20% of their lifespan
-
Performance - Perceptual Quality (Super-resolution, Denoising, Enhancement)
- Reference-based (require a clean ground-truth image):
- Peak Signal-to-Noise Ratio (PSNR): 10·log10(MAX² / MSE) · in dB, higher is better · fast to compute but weakly correlated with human perception
- Structural Similarity Index (SSIM): measures luminance, contrast, and structure jointly · range [0,1], higher is better
- Multi-Scale SSIM (MS-SSIM): SSIM computed at multiple resolutions · more robust to viewing distance
- Learned Perceptual Image Patch Similarity (LPIPS): deep feature distance · strongly correlated with human judgement · lower is better
- Visual Information Fidelity (VIF): mutual information between reference and distorted image features
- No-reference (blind, no ground-truth required):
- Natural Image Quality Evaluator (NIQE): lower is better · measures deviation from natural scene statistics
- BRISQUE: lower is better · spatial natural scene statistics
- Gradient Magnitude Similarity Deviation (GMSD): fast, gradient-based · lower is better
- Reference-based (require a clean ground-truth image):
-
Performance - Generation (GANs, Diffusion Models)
- Fréchet Inception Distance (FID): distance between Inception feature distributions of real and generated images · lower is better · primary benchmark metric
- Inception Score (IS): measures quality and diversity jointly using classifier confidence and entropy · higher is better · less reliable than FID on its own
- Kernel Inception Distance (KID): like FID but uses MMD instead of Gaussian assumption · unbiased with small sample sizes · lower is better
- Perceptual Path Length (PPL): smoothness of the latent space · used for GANs · lower is better
- CLIP Score: cosine similarity between CLIP embeddings of generated image and text prompt · used for text-to-image evaluation · higher is better
- Human Evaluation: side-by-side preference studies remain the gold standard for generative quality
-
Performance - Depth Estimation
- Absolute Relative Error (AbsRel): mean( |d - d*| / d* ) · lower is better
- Squared Relative Error (SqRel): mean( |d - d*|² / d* )
- Root Mean Squared Error (RMSE) and RMSE log
- Threshold Accuracy (δ < 1.25, 1.25², 1.25³): fraction of pixels where max(d/d*, d*/d) < threshold · higher is better
-
Performance - Pose Estimation
- Percentage of Correct Keypoints (PCK): keypoint within α · torso diameter of ground truth · PCK@0.2 is standard
- Object Keypoint Similarity (OKS): analogous to IoU for keypoints · accounts for keypoint visibility and scale · used by COCO
- Mean Per Joint Position Error (MPJPE): average Euclidean distance between predicted and ground-truth 3D joints · in mm
-
Computation
- Latency: end-to-end inference time per image (ms) · report hardware, batch size, and input resolution
- Throughput: Frames Per Second (FPS) · report the same context as latency
- Parameters (M): total trainable parameter count · proxy for memory footprint
- FLOPs / MACs: floating-point operations or multiply-accumulate operations per forward pass · hardware-independent complexity measure
- Model Size (MB): weight file size on disk
- GPU Memory (VRAM, GB): peak memory during inference · critical for deployment constraints
Tags: Object Classification
[ObjCls], Object Detection[ObjDet], Object Segmentation[ObjSeg], General Library[GenLib], Text Reading / Object Character Recognition[OCR], Action Recognition[ActRec], Object Tracking[ObjTrk], Data Augmentation[DatAug], Simultaneous Localization and Mapping[SLAM], Outlier/Anomaly/Novelty Detection[NvlDet], Content-based Image Retrieval[CBIR], Image Enhancement[ImgEnh], Aesthetic Assessment[AesAss], Explainable Artificial Intelligence[XAI], Text-to-Image Generation[TexImg], Pose Estimation[PosEst], Video Matting[VidMat], Eye Tracking[EyeTrk]
- computervision-recipes
[GenLib]Microsoft's best practices, code samples, and documentation for Computer Vision. - FastAI
[GenLib]Library over PyTorch used for learning and practicing machine learning and deep learning. - pytorch-lightning
[GenLib]Lightweight PyTorch wrapper for high-performance AI research. - ignite
[GenLib]PyTorch's high-level library to help with training and evaluating neural networks flexibly and transparently. - pytorch_geometric
[GenLib]Graph Neural Network Library for PyTorch. - kornia
[GenLib]Open source differentiable computer vision library. - ncnn
[GenLib]Tencent's high-performance neural network inference framework optimized for mobile platforms. - ITK
[GenLib]Open-source, cross-platform toolkit for N-dimensional scientific image processing, segmentation, and registration. - VTK
[GenLib]Open-source software system for image processing, 3D graphics, volume rendering and visualization. - MONAI
[GenLib]PyTorch-based, open-source framework for deep learning in healthcare imaging. - keras-cv
[GenLib]Library of modular computer vision oriented Keras components. - MediaPipe
[ObjDet][ObjSeg][ObjTrk][GenLib]Google's cross-platform framework supporting face detection, hand/pose tracking, object detection, hair segmentation, and more. - PyTorch image models
[ObjCls]A wide collection of PyTorch image classification models, scripts, and pretrained weights. - mmclassification
[ObjCls]OpenMMLab's image classification toolbox and benchmark. - vit-pytorch
[ObjCls]SOTA implementations of vision transformers in PyTorch. - face_classification
[ObjCls][ObjDet]Real-time face detection and emotion/gender classification. - mmdetection
[ObjDet]OpenMMLab's image detection toolbox and benchmark. - detectron2
[ObjDet][ObjSeg]Facebook FAIR's next-generation platform for object detection, segmentation, and other visual recognition tasks. - detr
[ObjDet]Facebook's end-to-end object detection with transformers. - libfacedetection
[ObjDet]Open source library for face detection in images, achieving ~1000FPS. - FaceDetection-DSFD
[ObjDet]Tencent's state-of-the-art face detector. - Object-Detection-Metrics
[ObjDet]The most popular metrics used to evaluate object detection algorithms. - SAHI
[ObjDet][ObjSeg]Lightweight vision library for large-scale object detection and instance segmentation. - yolov5
[ObjDet]Ultralytics' YOLOv5 object detection framework. - darknet
[ObjDet]YOLOv4 / Scaled-YOLOv4 / YOLOv3 / YOLOv2 implementations. - U-2-Net
[ObjDet]U²-Net: nested U-structure architecture for salient object detection. - segmentation_models.pytorch
[ObjSeg]PyTorch segmentation models with pretrained backbones. - mmsegmentation
[ObjSeg]OpenMMLab's semantic segmentation toolbox and benchmark. - PaddleSeg
[ObjSeg]Easy-to-use image segmentation library supporting semantic, interactive, panoptic, and 3D segmentation among others. - mmocr
[OCR]OpenMMLab's text detection, recognition and understanding toolbox. - pytesseract
[OCR]A Python wrapper for Google's Tesseract OCR engine. - EasyOCR
[OCR]Ready-to-use OCR supporting 80+ languages and all popular writing scripts. - PaddleOCR
[OCR]Practical ultra-lightweight OCR system supporting 80+ languages with tools for training and deployment across server, mobile, and IoT devices. - mmtracking
[ObjTrk]OpenMMLab's video perception toolbox for object detection and tracking. - mmaction
[ActRec]OpenMMLab's open-source toolbox for action understanding based on PyTorch. - albumentations
[DatAug]Fast image augmentation library with an easy-to-use wrapper around other libraries. - Random-Erasing
[DatAug]Random erasing data augmentation implemented in PyTorch. - CutMix-PyTorch
[DatAug]Official PyTorch implementation of the CutMix regularizer. - ORB_SLAM2
[SLAM]Real-time SLAM for monocular, stereo and RGB-D cameras with loop detection and relocalization. - pyod
[NvlDet]Python toolbox for scalable outlier and anomaly detection. - alibi-detect
[NvlDet]Algorithms for outlier, adversarial, and drift detection. - fastdup
[NvlDet][CBIR]Unsupervised and free tool for image and video dataset analysis. - imagededup
[CBIR]Simple tool to find and remove duplicate images from datasets. - image-match
[CBIR]Fast image retrieval system capable of searching over billions of images. - Bringing-Old-Photos-Back-to-Life
[ImgEnh]Microsoft's CVPR 2020 oral paper implementation for restoring old and damaged photos. - image-quality-assessment
[AesAss]Idealo's NIMA model to predict the aesthetic and technical quality of images. - aesthetics
[AesAss]Image aesthetics toolkit using Fisher Vectors. - openpose
[PosEst]Real-time multi-person keypoint detection for body, face, hands, and feet. - RobustVideoMatting
[VidMat]Robust video matting supporting PyTorch, TensorFlow, ONNX, and CoreML. - PsychoPy
[EyeTrk]Library for running psychology and neuroscience experiments. - pytorch-cnn-visualizations
[XAI]PyTorch implementations of convolutional neural network visualization techniques. - Captum
[XAI]PyTorch team's library for model interpretability and understanding. - Alibi
[XAI]Algorithms for explaining machine learning models. - iNNvestigate
[XAI]TensorFlow toolbox for investigating neural network predictions. - keras-vis
[XAI]Neural network visualization toolkit for Keras. - Keract
[XAI]Keras tool for extracting layer outputs and gradients. - pytorch-grad-cam
[XAI]Advanced AI explainability for computer vision in PyTorch. - SHAP
[XAI]Game-theoretic approach to explain the output of any machine learning model. - TensorWatch
[XAI]Microsoft's debugging, monitoring, and visualization tool for Python ML and data science. - WeightWatcher
[XAI]Open-source diagnostic tool for analyzing deep neural networks without needing training or test data. - DALLE2-pytorch
[TexImg]PyTorch implementation of OpenAI's DALL-E 2 text-to-image synthesis network. - imagen-pytorch
[TexImg]PyTorch implementation of Google's Imagen text-to-image neural network.
- PyTorch - CV Datasets, Meta
- Tensorflow - CV Datasets, Google
- CVonline: Image Databases, Edinburgh University, Thanks to Robert Fisher!
- Kaggle
- PaperWithCode, Meta
- RoboFlow
- VisualData
- CUHK Computer Vision
- VGG - University of Oxford
Tags: Popular individuals
[Individual], Conference and event[Conferences], University research groups[University], Interactive talks and podscasts[Talks], Research articles' explanation[Papers].
- @AurelienGeron
[Individual], Aurélien Géron: former lead of YouTube's video classification team, and author of the O'Reilly book Hands-On Machine Learning with Scikit-Learn and TensorFlow. - @howardjeremyp
[Individual], Jeremy Howard: former president and chief scientist of Kaggle, and co-founder of fast.ai. - @PieterAbbeel
[Individual], Pieter Abbeel: professor of electrical engineering and computer sciences, University of California, Berkeley. - @pascalpoupart3507
[Individual], Pascal Poupart: professor in the David R. Cheriton School of Computer Science at the University of Waterloo. - @MatthiasNiessner
[Individual], Matthias Niessner: Professor at the Technical University of Munich and head of the Visual Computing Lab. - @MichaelBronsteinGDL
[Individual], Michael Bronstein: DeepMind Professor of AI, University of Oxford / Head of Graph Learning Research, Twitter. - @DeepFindr
[Individual], Videos about all kinds of Machine Learning / Data Science topics. - @deeplizard
[Individual], Videos about building collective intelligence. - @YannicKilcher
[Individual], Yannic Kilcher: videos about machine learning research papers, programming, and issues of the AI community, and the broader impact of AI in society. - @sentdex
[Individual], sentdex: provides Python programming tutorials in machine learning, finance, data analysis, robotics, web development, game development and more. - @AAmini
[Individual], Alexander Amini: Research Affilliate at MIT, videos about deep learning and data science. - @WhatsAI
[Individual], Louis-François Bouchard: PhD in MILA, videos about AI. - mrdbourke
[Individual], Daniel Bourke: ML engineer in healthcare, videos about AI. - marksaroufim
[Individual], Mark Saroufim: PyTorch engineer at Meta (Facebook), videos about AI. - NicholasRenotte
[Individual], Nicholas Renotte: videos about computer vision, natural language processign and reinforcement learning applications. - abhishekkrthakur
[Individual], Abhishek Thakur: world's first Quadruple Grand Master on Kaggle, videos about applied machine learning, deep learning, and data science. - @AladdinPersson
[Individual], Aladdin Persson: clear implementations of ML and CV papers from scratch in PyTorch and TensorFlow. - @CodeEmporium
[Individual], The Code Emporium: intuitive explanations of ML concepts and architectures. - @AICoffeeBreak
[Individual], AI Coffee Break with Letitia: short, accessible walkthroughs of recent AI and CV research. - @mildlyoverfitted
[Individual], Mildly Overfitted: hands-on CV and ML tutorials with clean code. - @SmithaKolan
[Individual], Smitha Kolan: computer vision tutorials focused on practical applications. - @KapilSachdeva
[Individual], Kapil Sachdeva: in-depth explanations of ML research and engineering. - @alfcnz
[Individual], Alfredo Canziani: assistant professor at NYU, deep learning theory and practice. - @arp_ai
[Individual], Jay Alammar: applied ML and computer vision projects. - @bmvabritishmachinevisionas8529
[Conferences], BMVA: British Machine Vision Association. - @ComputerVisionFoundation
[Conferences], Computer Vision Foundation (CVF): co-sponsored conferences on computer vision (e.g. CVPR and ICCV). - @cvprtum
[University], Computer Vision Group at Technical University of Munich. - @UCFCRCV
[University], Center for Research in Computer Vision at University of Central Florida. - @dynamicvisionandlearninggr1022
[University], Dynamic Vision and Learning research group channel! Technical University of Munich. - @TubingenML
[University], Machine Learning groups at the University of Tübingen. - @computervisiontalks4659
[Talks], Computer Vision Talks. - @freecodecamp
[Talks], Videos to learn how to code. - @LondonMachineLearningMeetup
[Talks], Largest machine learning community in Europe. - @LesHouches-iu6nv
[Talks], Summer school on Statistical Physics of Machine learning held in Les Houches, July 4 - 29, 2022. - @MachineLearningStreetTalk
[Talks], top AI podcast on Spotify. - @WeightsBiases
[Talks], Weights and Biases team's conversations with industry experts, and researchers. - @PreserveKnowledge
[Talks], Canada higher education media organization that focuses on advances in mathematics, computer science, and artificial intelligence. - @TwoMinutePapers
[Papers], Two Minute Papers: Explaining AI papers in few mins. - @TheAIEpiphany
[Papers], Aleksa Gordić: x-Google DeepMind, x-Microsoft engineer explaining AI papers. - @bycloudAI
[Papers], bycloud: covers the latest AI tech/research papers for fun.
- Vision Science, announcements about industry/academic jobs in computer vision around the world (in English).
- bull-i3, posts about job opportunities in computer vision in France (in French).
Entries in this list are included because they are:
- Genuinely educational — they help you understand something, not just use it
- Well-maintained (or historically significant if archived)
- Accessible — free or widely available where possible
Entries marked (last updated: YEAR) in the libraries section are included for
historical or educational value despite no longer being actively developed.
This list is maintained by a computer vision researcher and university academic. Suggestions and pull requests are welcome. Please check CONTRIBUTING.md.
- Frida de Sigley
- CORE Conference Ranking
- Scimago Journal Ranking
- benthecoder/yt-channels-DS-AI-ML-CS
- anomaly-detection-resources, Anomaly detection related books, papers, videos, and toolboxes
- awesome-satellite-imagery-datasets List of satellite image training datasets with annotations for computer vision and deep learning
- awesome-Face_Recognition, Computer vision papers about faces.
- the-incredible-pytorch, Curated list of tutorials, papers, projects, communities and more relating to PyTorch