- Project Overview
- Features
- System Architecture
- Technical Stack
- Getting Started
- User Guide
- Performance Optimization
- Troubleshooting
- Development Notes
- Future Roadmap
- Appendix
DocuMind is an AI-powered knowledge base assistant that allows users to upload PDF documents and ask natural language questions about their content. The system uses state-of-the-art language models and document retrieval techniques to provide accurate, contextual answers with source citations. DocuMind is designed to run entirely locally, ensuring privacy and data security.
The system is built as a containerized application with Docker, making it easy to deploy across different operating systems and environments. It features two interfaces: a Streamlit-based UI and a more traditional HTML/CSS/JavaScript web interface.
- 📄 Multi-format PDF Processing: Robust text extraction with fallback mechanisms and OCR support
- 📁 Automatic Document Loading: Ability to auto-load PDFs from the documents directory
- 🔍 Hybrid Retrieval System: Combines semantic similarity with keyword matching
- 🤖 Local AI Integration: Uses Ollama (Llama 3.2 3B) for privacy-preserving responses
- 💬 Conversation Memory: Maintains context across multiple questions
- 📊 Source Attribution: Always shows which documents informed each answer
- ⚡ Real-time Evaluation: Built-in quality metrics inspired by RAGAS
- 🎯 Adaptive Query Processing: Routes different query types to specialized chains
- 📈 Analytics Dashboard: Performance metrics and user feedback analysis
- 🔒 Privacy-First: All processing happens locally - no external APIs
DocuMind follows a containerized architecture with two main components:
- DocuMind Container: Handles document processing, embedding generation, vector storage, and hosts both the API and web interface.
- Ollama Container: Provides the LLM (Large Language Model) capabilities.
The system uses a hybrid vector + keyword retrieval system to find relevant document chunks, which are then fed to the LLM to generate accurate responses.
-
Document Ingestion:
- PDFs are processed using multiple extraction methods (PyPDF2, PyMuPDF, pdfplumber)
- OCR fallback for problematic PDFs (using Tesseract)
- Text is chunked semantically for optimal retrieval
-
Vector Storage:
- Document chunks are embedded using Sentence Transformers
- Embeddings are stored in a local ChromaDB vector database
-
Query Processing:
- User questions are embedded using the same model
- Hybrid retrieval combines semantic similarity and keyword matching
- Retrieved chunks are ranked and filtered
-
Answer Generation:
- Top document chunks are formatted into a prompt
- Ollama LLM generates a response with source citations
- Response is evaluated for quality metrics
- Document Processing: PyPDF2, PyMuPDF, pdfplumber, Tesseract OCR
- Embeddings: Sentence-Transformers (all-MiniLM-L6-v2)
- Vector Database: ChromaDB
- LLM: Ollama (Llama 3.2 3B)
- Frontend: HTML/CSS/JavaScript, Streamlit
- Orchestration: Docker, Docker Compose
- Evaluation: RAGAS-inspired framework
- Backend: Python FastAPI
The easiest way to get started with DocuMind is through Docker. This approach works on any operating system and handles all dependencies.
- Docker installed on your system
- Docker Compose installed on your system
- At least 4GB of free RAM (8GB+ recommended)
- At least 10GB of free disk space
-
Clone or download the project
-
Run the setup script:
chmod +x run_docker.sh ./run_docker.sh
-
Select option 1 from the menu to start DocuMind
-
Access the interfaces:
- Web UI: http://localhost:8080
- API: http://localhost:8000/api
For systems with NVIDIA GPUs, DocuMind can leverage GPU acceleration:
- The
run_docker.shscript will automatically detect compatible NVIDIA GPUs - Ensure you have the NVIDIA Container Toolkit installed
If you prefer not to use Docker, you can set up DocuMind manually:
-
Install Python dependencies:
pip install -r requirements.txt
-
Install OCR dependencies (optional but recommended):
- See OCR Setup for platform-specific instructions
-
Start the application:
- For Streamlit interface:
streamlit run app.py - For web interface:
python api.py
- For Streamlit interface:
There are two ways to add documents to DocuMind:
- Place PDF files in the
data/documentsdirectory - Start or restart DocuMind
- The system will automatically detect and process new documents
- Navigate to the web interface (http://localhost:8080)
- Click the "Upload" button in the sidebar
- Select one or more PDF files from your computer
- Wait for processing to complete (progress will be displayed)
Once you have documents loaded, you can ask questions in natural language:
- Type your question in the input box
- Click "Ask" or press Enter
- The system will retrieve relevant information and generate an answer
- Sources will be cited alongside the answer
Example Questions:
- "What is the main focus of the project described in the technical report?"
- "Summarize the key findings from the quarterly report."
- "Compare the investment strategies mentioned in documents A and B."
DocuMind responses include:
- Answer Text: The main response to your query
- Source Citations: References to specific documents where information was found
- Confidence Score: An indicator of the system's confidence in the answer
- Reasoning Path: (Advanced view) How the system arrived at its conclusion
If you're having issues with specific PDFs, use the diagnostic tool:
python tests/check_pdf.py path/to/your/document.pdfThis will analyze the PDF and recommend the best extraction approach.
To use a different Ollama model:
- Run
./run_docker.sh - Select option 5 to switch models
- Choose from the available options or specify a custom model
For documents requiring OCR processing:
-
Install Tesseract OCR and Poppler:
- macOS:
brew install tesseract poppler - Ubuntu/Debian:
sudo apt-get install tesseract-ocr poppler-utils - Windows: Install from the official repositories (see OCR_SETUP.md)
- macOS:
-
Install Python packages:
pip install pytesseract pdf2image pillow
DocuMind pre-downloads and caches embedding models to improve startup and query time:
- Models are stored in
./data/models_cache/ - ONNX optimized versions are kept in
./data/chroma_cache/onnx_models/
Choose the right LLM based on your hardware:
- High-end systems: Use larger models like
llama3.2:3b(default) - Low-resource systems: Switch to
phi3:minifor faster responses
Adjust Docker resource limits based on your system:
- Minimum: 4GB RAM
- Recommended: 8GB RAM
- For GPU systems: Enable GPU acceleration
Symptom: Requests timeout with error: "Error generating response: HTTPConnectionPool(host='ollama', port=11434): Read timed out. (read timeout=60)"
Solution:
- Switch to a smaller LLM model through option 5 in the run_docker.sh script
- Restart the containers to apply changes
Symptom: Documents fail to load or extract properly
Solution:
- Check the format of your PDF
- Run diagnostic tool:
python tests/check_pdf.py path/to/document.pdf - Enable OCR for problematic documents
Symptom: Cannot access the web interface at http://localhost:8080
Solution:
- Verify containers are running:
docker compose ps - Check logs:
docker compose logs documind - Ensure ports aren't in use by other applications
DocuMind/
├── app.py # Main Streamlit application
├── api.py # Alternative web interface API
├── docker-entrypoint.sh # Docker container startup script
├── Dockerfile # Main container definition
├── docker-compose.yml # Container orchestration
├── docker-compose.gpu.yml # GPU support configuration
├── run_docker.sh # Helper script for Docker management
├── src/
│ ├── document_processor.py # PDF processing and extraction with OCR
│ ├── chunking.py # Semantic text chunking
│ ├── retriever.py # Hybrid retrieval system
│ ├── llm_handler.py # LLM integration and prompts
│ ├── evaluator.py # Evaluation framework
│ ├── preload_models.py # Model preloading script
│ └── utils.py # Utility functions
├── data/
│ ├── documents/ # PDF document storage
│ ├── vectorstore/ # Chroma vector database
│ ├── models_cache/ # Hugging Face model cache
│ └── chroma_cache/ # ChromaDB ONNX model cache
├── config/
│ └── settings.py # Configuration settings
├── web/ # Web UI files (HTML, CSS, JS)
└── tests/ # Testing and diagnostic tools
Planned enhancements for future versions:
- Multilingual Support: Processing documents in multiple languages
- Document Update Detection: Automatically detecting and processing updated documents
- Enhanced Visualization: Adding charts and diagrams for data-heavy responses
- Multi-User Support: Account-based access with personalized collections
- 4GB RAM
- Dual-core CPU
- 10GB free disk space
- 8GB RAM
- Quad-core CPU
- 20GB free disk space
- NVIDIA GPU with 4GB+ VRAM (for GPU acceleration)
Edit config/settings.py to customize:
EMBEDDING_MODEL: The model used for document embeddingsOLLAMA_MODEL: The LLM model used for responsesMAX_CHUNK_SIZE: Maximum token size for document chunksTOP_K_DOCUMENTS: Number of document chunks to retrieveOCR_ENABLED: Enable/disable OCR processing