This project is an intelligent, conversational search agent that allows users to query their own document knowledge base using natural language. It leverages Google's Gemini generative AI and semantic vector search to provide accurate, context-aware answers solely based on the uploaded documents.
- Multi-Format Support: Seamlessly processes and extracts text from a wide variety of file formats:
- Documents: PDF, DOCX, TXT, CSV, Excel (XLSX/XLS), PowerPoint (PPTX).
- Images: JPG, JPEG, PNG, WEBP (uses Gemini Vision for OCR).
- Audio: MP3, WAV (uses Gemini for transcription and speaker separation).
- Semantic Search: Utilizes
sentence-transformers(all-MiniLM-L6-v2) to generate vector embeddings for documents, enabling high-quality semantic retrieval that goes beyond simple keyword matching. - Generative AI Powered: Powered by Google Gemini 2.5 Flash Lite for:
- Intelligent keyword extraction from user queries.
- Advanced image OCR and audio transcription.
- Synthesizing natural language answers based strictly on document context.
- Conversational & Search Modes:
- Search Query Mode: Returns structured, citation-rich results similar to AWS Kendra (Filename, Excerpt, Explanation).
- Answer Mode: Provides a direct, medium-length conversational answer citing relevant files.
- Automated Pipeline: Simply upload files, and the system automatically handles text extraction, cleaning, and vector embedding generation.
- Backend Framework: Python, Flask, Flask-RESTful
- AI & LLM: Google GenAI SDK (Gemini), Sentence Transformers
- Vector Search: Cosine Similarity (via Scikit-Learn/NumPy)
- File Processing: PyMuPDF (PDFs), python-docx (Word), pandas (Excel/CSV), python-pptx (Slides), Pillow (Images)
- Frontend: HTML (Simple template provided)
- Python 3.9+
- A Google Cloud API Key with access to Gemini models.
-
Clone the Repository
git clone <repository-url> cd Conversational-Document-Search-Agent
-
Create a Virtual Environment
python -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate
-
Install Dependencies
pip install -r requirements.txt
-
Configure Environment Variables Create a
.envfile in the root directory and add your Google API key:GOOGLE_API_KEY=your_actual_api_key_here
-
Start the Server
python app.py
The application will run on
http://0.0.0.0:6209. -
Access the Interface Open your browser and navigate to
http://localhost:6209. -
Upload Documents
- Use the upload interface (or the
/upload-filesAPI endpoint) to upload your knowledge base documents. - The system will automatically extract text and generate embeddings. This may take a moment depending on file size and count.
- Use the upload interface (or the
-
Query Your Data
- Use the search bar to ask questions like:
- "What is the summary of the education policy?"
- "Find invoices related to automobile repairs."
- "What did the speakers discuss in the audio file?"
- Use the search bar to ask questions like:
├── app.py # Main Flask application entry point
├── routes.py # API route definitions
├── views.py # API logic (Controllers) for Upload, Search, and Answer
├── utils.py # Utility functions (File processing, OCR, Embeddings)
├── prompts.py # System prompts for Gemini (Keyword extraction, RAG)
├── templates/ # HTML templates for the frontend
├── files/ # Directory where uploaded raw files are stored
├── text_data/ # Directory for extracted text files
├── embedding_contents/ # Stores vector embeddings (.npy) and file mappings (.json)
└── requirements.txt # Project dependencies