Skip to content

ZaidK07/Conversational-Document-Search-Assistant

Repository files navigation

Conversational Document Search Agent

This project is an intelligent, conversational search agent that allows users to query their own document knowledge base using natural language. It leverages Google's Gemini generative AI and semantic vector search to provide accurate, context-aware answers solely based on the uploaded documents.

Key Features

  • Multi-Format Support: Seamlessly processes and extracts text from a wide variety of file formats:
    • Documents: PDF, DOCX, TXT, CSV, Excel (XLSX/XLS), PowerPoint (PPTX).
    • Images: JPG, JPEG, PNG, WEBP (uses Gemini Vision for OCR).
    • Audio: MP3, WAV (uses Gemini for transcription and speaker separation).
  • Semantic Search: Utilizes sentence-transformers (all-MiniLM-L6-v2) to generate vector embeddings for documents, enabling high-quality semantic retrieval that goes beyond simple keyword matching.
  • Generative AI Powered: Powered by Google Gemini 2.5 Flash Lite for:
    • Intelligent keyword extraction from user queries.
    • Advanced image OCR and audio transcription.
    • Synthesizing natural language answers based strictly on document context.
  • Conversational & Search Modes:
    • Search Query Mode: Returns structured, citation-rich results similar to AWS Kendra (Filename, Excerpt, Explanation).
    • Answer Mode: Provides a direct, medium-length conversational answer citing relevant files.
  • Automated Pipeline: Simply upload files, and the system automatically handles text extraction, cleaning, and vector embedding generation.

Tech Stack

  • Backend Framework: Python, Flask, Flask-RESTful
  • AI & LLM: Google GenAI SDK (Gemini), Sentence Transformers
  • Vector Search: Cosine Similarity (via Scikit-Learn/NumPy)
  • File Processing: PyMuPDF (PDFs), python-docx (Word), pandas (Excel/CSV), python-pptx (Slides), Pillow (Images)
  • Frontend: HTML (Simple template provided)

Prerequisites

  • Python 3.9+
  • A Google Cloud API Key with access to Gemini models.

Installation

  1. Clone the Repository

    git clone <repository-url>
    cd Conversational-Document-Search-Agent
  2. Create a Virtual Environment

    python -m venv .venv
    source .venv/bin/activate  # On Windows: .venv\Scripts\activate
  3. Install Dependencies

    pip install -r requirements.txt
  4. Configure Environment Variables Create a .env file in the root directory and add your Google API key:

    GOOGLE_API_KEY=your_actual_api_key_here

Usage

  1. Start the Server

    python app.py

    The application will run on http://0.0.0.0:6209.

  2. Access the Interface Open your browser and navigate to http://localhost:6209.

  3. Upload Documents

    • Use the upload interface (or the /upload-files API endpoint) to upload your knowledge base documents.
    • The system will automatically extract text and generate embeddings. This may take a moment depending on file size and count.
  4. Query Your Data

    • Use the search bar to ask questions like:
      • "What is the summary of the education policy?"
      • "Find invoices related to automobile repairs."
      • "What did the speakers discuss in the audio file?"

Project Structure

├── app.py              # Main Flask application entry point
├── routes.py           # API route definitions
├── views.py            # API logic (Controllers) for Upload, Search, and Answer
├── utils.py            # Utility functions (File processing, OCR, Embeddings)
├── prompts.py          # System prompts for Gemini (Keyword extraction, RAG)
├── templates/          # HTML templates for the frontend
├── files/              # Directory where uploaded raw files are stored
├── text_data/          # Directory for extracted text files
├── embedding_contents/ # Stores vector embeddings (.npy) and file mappings (.json)
└── requirements.txt    # Project dependencies

About

This project implements an advanced Conversational Document Search Agent utilizing the Retrieval-Augmented Generation (RAG) framework. It transforms custom document collections into an chat-enabled knowledge base, allowing users to ask natural language questions and receive accurate, context-aware answers.

Topics

Resources

Stars

Watchers

Forks

Contributors