This repository implements a conversational Retrieval Augmented Generation (RAG) system using Gemini 2.0 Flash, Google Generative AI Embeddings, and ChromaDB to provide information about books scraped from books.toscrape.com. It features persistent storage, conversational context awareness, and robust error handling.
- Conversational Context Awareness: Maintains a chat history to understand the flow of the conversation.
- Web Scraping: Scrapes book data from books.toscrape.com, including title, price, description, category, and other details.
- Persistent Vector Store: Uses ChromaDB to create and persist a vector store of book information, allowing for efficient retrieval.
- Gemini Integration: Leverages Gemini 2.0 Flash for question answering and text generation.
- Google Generative AI Embeddings: Uses Google Generative AI Embeddings to create vector embeddings for book data.
graph LR
A[User Query] --> B(Conversational RAG System);
B --> C{ChromaDB Exists?};
C -- Yes --> D[Load ChromaDB];
C -- No --> E[Scrape Books];
E --> F[Create Documents];
F --> G[Create Embeddings];
G --> H[Store in ChromaDB];
D --> I[Retrieve Relevant Docs];
H --> I;
I --> J[Gemini 2.0 Flash];
J --> K[Generate Response];
K --> L[Display Response];
L --> M[Update Conversation Memory];
M --> A;
- Python 3.6+
- Google Cloud Project with Gemini API enabled
- Google Cloud API key
.envfile withGEMINI_API_KEYset
-
Clone the repository:
git clone [repository_url] cd [repository_directory] -
Create a virtual environment (recommended):
python3 -m venv venv source venv/bin/activate # On macOS and Linux venv\Scripts\activate # On Windows
-
Install dependencies:
pip install -r requirements.txt
(See
requirements.txtfor the list of dependencies.) -
Create a
.envfile:GEMINI_API_KEY=YOUR_GEMINI_API_KEYReplace
YOUR_GEMINI_API_KEYwith your actual Gemini API key. -
Place the following files in your directory:
bookquery.py(Main script)webscraper.py(Web scraping functions)rag_utils.py(RAG system utilities)
-
Run the script:
python bookquery.py
-
Enter your queries:
The script will prompt you to enter queries about books. You can ask questions like:
- "Tell me about 'A Light in the Attic'."
-
Exit the application:
Type "exit" and press Enter.
bookquery.py: Main script that orchestrates the RAG system and handles user interaction.webscraper.py: Contains functions for scraping book details (extract_book_details) and scraping all books (scrape_all_books).rag_utils.py: Holds thebuild_rag_systemfunction that sets up the conversational RAG system, including ChromaDB, embeddings, Gemini integration, and conversation memory.
Contributions are welcome! Please submit a pull request or open an issue for any bugs or feature requests.