Conversational Document Search Agent

This project is an intelligent, conversational search agent that allows users to query their own document knowledge base using natural language. It leverages Google's Gemini generative AI and semantic vector search to provide accurate, context-aware answers solely based on the uploaded documents.

Key Features

Multi-Format Support: Seamlessly processes and extracts text from a wide variety of file formats:
- Documents: PDF, DOCX, TXT, CSV, Excel (XLSX/XLS), PowerPoint (PPTX).
- Images: JPG, JPEG, PNG, WEBP (uses Gemini Vision for OCR).
- Audio: MP3, WAV (uses Gemini for transcription and speaker separation).
Semantic Search: Utilizes sentence-transformers (all-MiniLM-L6-v2) to generate vector embeddings for documents, enabling high-quality semantic retrieval that goes beyond simple keyword matching.
Generative AI Powered: Powered by Google Gemini 2.5 Flash Lite for:
- Intelligent keyword extraction from user queries.
- Advanced image OCR and audio transcription.
- Synthesizing natural language answers based strictly on document context.
Conversational & Search Modes:
- Search Query Mode: Returns structured, citation-rich results similar to AWS Kendra (Filename, Excerpt, Explanation).
- Answer Mode: Provides a direct, medium-length conversational answer citing relevant files.
Automated Pipeline: Simply upload files, and the system automatically handles text extraction, cleaning, and vector embedding generation.

Tech Stack

Backend Framework: Python, Flask, Flask-RESTful
AI & LLM: Google GenAI SDK (Gemini), Sentence Transformers
Vector Search: Cosine Similarity (via Scikit-Learn/NumPy)
File Processing: PyMuPDF (PDFs), python-docx (Word), pandas (Excel/CSV), python-pptx (Slides), Pillow (Images)
Frontend: HTML (Simple template provided)

Prerequisites

Python 3.9+
A Google Cloud API Key with access to Gemini models.

Installation

Clone the Repository

git clone <repository-url>
cd Conversational-Document-Search-Agent

Create a Virtual Environment

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

Install Dependencies
```
pip install -r requirements.txt
```
Configure Environment Variables Create a .env file in the root directory and add your Google API key:
```
GOOGLE_API_KEY=your_actual_api_key_here
```

Usage

Start the Server
```
python app.py
```
The application will run on http://0.0.0.0:6209.
Access the Interface Open your browser and navigate to http://localhost:6209.
Upload Documents
- Use the upload interface (or the /upload-files API endpoint) to upload your knowledge base documents.
- The system will automatically extract text and generate embeddings. This may take a moment depending on file size and count.
Query Your Data
- Use the search bar to ask questions like:
  - "What is the summary of the education policy?"
  - "Find invoices related to automobile repairs."
  - "What did the speakers discuss in the audio file?"

Project Structure

├── app.py              # Main Flask application entry point
├── routes.py           # API route definitions
├── views.py            # API logic (Controllers) for Upload, Search, and Answer
├── utils.py            # Utility functions (File processing, OCR, Embeddings)
├── prompts.py          # System prompts for Gemini (Keyword extraction, RAG)
├── templates/          # HTML templates for the frontend
├── files/              # Directory where uploaded raw files are stored
├── text_data/          # Directory for extracted text files
├── embedding_contents/ # Stores vector embeddings (.npy) and file mappings (.json)
└── requirements.txt    # Project dependencies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Conversational Document Search Agent

Key Features

Tech Stack

Prerequisites

Installation

Usage

Project Structure

About

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
embedding_contents		embedding_contents
files		files
templates		templates
text_data		text_data
.gitignore		.gitignore
README.md		README.md
app.py		app.py
prompts.py		prompts.py
requirements.txt		requirements.txt
routes.py		routes.py
utils.py		utils.py
views.py		views.py

Folders and files

Latest commit

History

Repository files navigation

Conversational Document Search Agent

Key Features

Tech Stack

Prerequisites

Installation

Usage

Project Structure

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages