YOLOv11 + CLIP: Natural Language Visual Search

This project takes object detection to the next level by combining YOLOv11 (for spatial localization) with CLIP (for semantic understanding). It allows you to search for specific objects in an image using free-form natural language.

🚀 Features

Modern Tech Stack: Uses the latest YOLOv11 and OpenAI's CLIP.
Natural Language Querying: Search for specific objects using free-form text (e.g., "red sports car with a spoiler").
Custom Image Support: Process any local image file by simply providing its path.
Smart Path Handling: Automatically cleans Windows-style paths (removes quotes from drag-and-drop).
Real-time Feature Extraction: Automatically crops detected objects and generates semantic embeddings.

🧠 How it Works

Detection: YOLOv11 identifies all objects in the scene.
Feature Extraction: Each detected object is cropped and passed through a CLIP Image Encoder to create a high-dimensional feature vector.
Semantic Matching: When you enter a text query, it is encoded using CLIP's Text Encoder.
Ranking: The system calculates the cosine similarity between the text query and all detected objects, selecting the best match.

🛠️ Setup

Install dependencies:
```
pip install -r requirements.txt
```
Run the system:
```
python main.py
```

📸 Example Usage

If you have an image with a bus and several people:

Query: "a person wearing a suit"
Result: The system will find the specific person in a suit, even if there are 10 other people in the image.

📂 Project Structure

main.py: Core logic for detection and semantic search.
requirements.txt: List of required Python packages.
test.jpg: Input image (auto-downloaded if missing).
search_result.jpg: The output image highlighting your search result.
Screenshot.png: The screenshot of terminal and the result

🎯 Future Upgrades

Video Support: Apply this to video streams for real-time tracking of specific descriptions.
Vector DB Integration: Store embeddings in ChromaDB to search across thousands of hours of footage instantly.
Web UI: Add a Gradio or Streamlit interface for a professional feel.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Screenshot 2026-03-31 010823.png		Screenshot 2026-03-31 010823.png
main.py		main.py
requirements.txt		requirements.txt
search_result_1.jpg		search_result_1.jpg
test.jpg		test.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

YOLOv11 + CLIP: Natural Language Visual Search

🚀 Features

🧠 How it Works

🛠️ Setup

📸 Example Usage

📂 Project Structure

🎯 Future Upgrades

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

YOLOv11 + CLIP: Natural Language Visual Search

🚀 Features

🧠 How it Works

🛠️ Setup

📸 Example Usage

📂 Project Structure

🎯 Future Upgrades

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages