Recruiters: Test this system in 5 minutes! See RECRUITER_SETUP.md for quick setup instructions.
# Windows - Quick Start
demo_setup.bat # One-time setup
start_demo.bat # Launch demo (opens browser automatically)Sample Documents: Ready-to-use test documents are included in the demo_samples/ folder.
RAG Document Q&A System is a production-grade Retrieval-Augmented Generation (RAG) platform for semantic document search and question answering. It ingests unstructured documents, generates vector embeddings, retrieves the most relevant context, and produces grounded answers with source attribution. The system emphasizes reliability, observability, and deployment flexibility.
- Install dependencies:
pip install -r requirements.txt - Configure API key: Set
GEMINI_API_KEYin.envfile - Start server:
uvicorn main:app --reload - Open chat UI: http://localhost:8000
- View API docs: http://localhost:8000/docs
- Multi-format document ingestion: PDF, TXT, DOCX, Markdown
- Vector storage and semantic search using ChromaDB
- Contextual and accurate answer generation via Google Gemini (embeddings + LLM)
- Similarity score thresholding and multi-document filtering
- Source citation with confidence scoring
- Structured logging (trace IDs) and error handling via custom exceptions and retries
- Containerized (Docker) for reproducible deployments + AWS EC2 cloud deployment
- Automated testing (unit + integration) with coverage reporting
- CI/CD pipelines (GitHub Actions) for build, test, and deploy automation
FastAPI application providing REST endpoints:
- Ingestion: File upload, validation, chunking
- Embedding: Gemini embedding generation
- Indexing: Vectors stored in ChromaDB (persistent directory)
- Query: Retrieve top-k relevant chunks (threshold + filters)
- Synthesis: Assemble context and generate answer via LLM
- Response: Return answer, cited sources, and metadata
Key Modules:
api/: Routing, request/response schemas, middlewarerag/: Ingestion, pipeline orchestration, retrieval logiccore/: Configuration management, LLM + embedding clientsdatabase/: Vector store abstraction (ChromaDB)utils/: Logging, retry strategy, custom exceptions
Upload → Validation → Chunking → Embedding → Vector Store → Query → Retrieval → Answer Generation → Response
Prerequisites: Python 3.11+, Google Gemini API key
python -m venv venv
./venv/Scripts/Activate.ps1
pip install -r requirements.txt
copy .env.example .env # Set GEMINI_API_KEY in .envEnvironment variables (in .env):
| Variable | Purpose | Default |
|---|---|---|
| GEMINI_API_KEY | LLM + embedding access | Required |
| CHUNK_SIZE | Text chunk token size | 500 |
| CHUNK_OVERLAP | Overlap between chunks | 50 |
| TOP_K_RESULTS | Max retrieved chunks | 5 |
| MIN_SIMILARITY_SCORE | Similarity filter threshold | 0.3 |
| MAX_UPLOAD_SIZE | Max file size in bytes (10MB = 10000000) | 10000000 |
| LOG_LEVEL | Log verbosity | INFO |
uvicorn main:app --reload --host 0.0.0.0 --port 8000| Method | Endpoint | Description |
|---|---|---|
| POST | /api/v1/documents/upload |
Upload and ingest a document |
| GET | /api/v1/documents |
List ingested documents |
| DELETE | /api/v1/documents/{doc_id} |
Delete a document |
| POST | /api/v1/query |
Submit a question and get answer |
| GET | /api/v1/health |
Health check endpoint |
# Run all tests
pytest tests -v
# Run with coverage report
pytest tests -v --cov=src --cov-report=html
# View report
start htmlcov/index.htmlTest the structured JSON logging, trace IDs, custom exceptions, and retry mechanisms:
# Automated test suite (recommended)
python test_logging_errors.pyFor comprehensive testing guide, see TESTING_LOGGING_ERRORS.md
Run comprehensive evaluation to benchmark system performance:
python evaluate.pyThis generates evaluation_report.json with metrics:
- NDCG@5: Retrieval ranking quality (Normalized Discounted Cumulative Gain)
- Similarity scores: Average relevance of retrieved documents
- Response time: End-to-end latency analysis
- Topic coverage: How well answers address expected topics
- SLA compliance: Percentage of queries meeting the 15-second threshold
docker-compose -f deployment/docker/docker-compose.yml up -ddocker build -f deployment/docker/Dockerfile -t rag-qa:latest .
docker run -p 8000:8000 --env-file .env rag-qa:latest- Launch an Ubuntu 22.04 EC2 instance
- Clone the repository and run
deployment/aws/ec2-setup.sh - CI/CD: Use GitHub Actions for build/test/deploy
src/
api/ # Routes, schemas, middleware
core/ # Settings, embeddings, LLM client
rag/ # Pipeline, ingestion, retrieval
database/ # Vector store wrapper (ChromaDB)
utils/ # Logging, exceptions, retry
tests/ # Unit and integration tests
deployment/ # Docker and AWS assets
static/ # Front-end assets (chat UI)
main.py # FastAPI entry point
requirements.txt