A full-stack Retrieval-Augmented Generation (RAG) application that allows you to chat with YouTube videos in real-time. It seamlessly integrates a Python backend (for transcript extraction, local embeddings, and LLM inference) with a Chrome Extension frontend embedded directly in the YouTube interface.
- Python 3.9+
- Ollama installed and running on your system.
- Google Chrome browser.
git clone https://github.com/Vinit-007/Youtube-Rag-Chrome-Extension
cd youtube-ragCreate and activate a virtual environment:
python -m venv venv
# On Windows:
.\venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activatepip install -r requirements.txtEnsure Ollama is running, then download the model required for this project:
ollama pull llama3.2:1bThe Python backend handles transcript downloads, chunking, embedding generation, and answering questions. Keep this running in the background.
# Ensure your virtual environment is active
python -m app.serverThe server will start listening on http://127.0.0.1:8765.
To interact with the RAG pipeline directly from your browser:
- Open Google Chrome.
- Go to
chrome://extensions/in the address bar. - Toggle Developer mode to ON (top right corner).
- Click the Load unpacked button (top left).
- Select the
chrome-extensionfolder located inside this project directory.
- Go to YouTube and open any video that contains an English transcript or closed captions.
- Look for the newly added AI Assistant UI on the page.
- Type your question about the video's content into the chatbox and press Ask.
- The extension will send the video URL and your question to the local backend, retrieve relevant chunks from the video transcript, and stream an AI-generated answer back to your screen.
If you want to understand how the RAG pipeline operates without the server or UI, open the Main.ipynb Jupyter Notebook. It provides a step-by-step walkthrough of transcript loading, text splitting, embedding storage, and query retrieval.
Note: Since embeddings are processed locally using CPU (via FAISS and Sentence Transformers), the initial load and processing time for a new video depends on the video's length and your system's hardware.