A modular Retrieval-Augmented Generation (RAG) system that allows users to upload TXT or PDF documents and query them using natural language questions. The API is built with FastAPI and uses local vector storage (FAISS or Qdrant), sentence-transformers embeddings, optional cross-encoder reranking, and Groq for fast LLM inference.
- Document ingestion endpoint (
/upload) for single TXT or PDF files - Question-answering endpoint (
/query) with optional source filtering - Switchable vector database: FAISS (in-memory/index file) or Qdrant (persistent)
- Configurable chunking strategies: fixed-size, sentence-based, or basic recursive
- Cosine similarity search with optional BGE-style reranking
- Groq API integration for low-latency generation (Llama-3.3-70B or similar models)
- Thread-safe FAISS writes using lock
- Health check endpoint
- Automatic interactive Swagger UI at
/docs
.
├── src/
│ ├── api.py # FastAPI application and endpoints
│ ├── config.py # YAML configuration loader
│ ├── embedding.py # SentenceTransformer wrapper
│ ├── chunking.py # Chunking logic (fixed, sentence, recursive)
│ ├── vector_db.py # Abstract VectorDB + FAISS/Qdrant implementations
│ ├── retrieval.py # Retrieval + optional reranking
│ └── generation.py # Groq LLM generation
├── config.yaml # Configuration file (models, DB type, chunk size, etc.)
├── requirements.txt # Dependencies
├── .env # Environment variables (GROQ_API_KEY)
└── README.md
- Python 3.10+
- Groq API key (set as environment variable
GROQ_API_KEY) - (Optional) Docker if you plan to run Qdrant in container mode later
-
Clone the repository:
git clone https://github.com/bijay-odyssey/Personal-Knowledge-Base-RAG-API.git cd Personal-Knowledge-Base-RAG-API -
Create and activate a virtual environment:
python -m venv venv source venv/bin/activate # Linux/macOS # or venv\Scripts\activate # Windows
-
Install dependencies:
pip install -r requirements.txt
-
Set your Groq API key:
Create a
.envfile in the root:GROQ_API_KEY=gsk_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxOr export it directly:
export GROQ_API_KEY="gsk_..."
Customize models, database backend, and chunking:
embedding:
model: "all-MiniLM-L6-v2"
vector_db:
type: "faiss" # or "qdrant"
path: "db/index" # for FAISS; ignored for in-memory Qdrant
chunking:
strategy: "fixed" # "fixed", "sentence", "recursive"
size: 512
reranker:
model: "cross-encoder/ms-marco-MiniLM-L-6-v2"
llm:
provider: "groq"
model: "llama-3.3-70b-versatile"Start the server with hot-reload (development):
uvicorn src.api:app --reload --port 8000Or in production mode:
uvicorn src.api:app --host 0.0.0.0 --port 8000Open http://localhost:8000/docs in your browser to access the interactive Swagger UI.
Upload and index a document.
Form-data:
file: TXT or PDF filesource_name(optional): custom label for filtering
Example (curl):
curl -X POST "http://localhost:8000/upload" \
-F "file=@/path/to/notes.pdf" \
-F "source_name=week1-notes"Ask a question against indexed documents.
JSON body:
{
"query": "What are embeddings?",
"filter_source": "week1-notes" // optional
}Example (curl):
curl -X POST "http://localhost:8000/query" \
-H "Content-Type: application/json" \
-d '{"query": "Explain embeddings", "filter_source": "notes.pdf"}'Check API status.
curl http://localhost:8000/health- FAISS uses a global lock for thread-safe writes in concurrent uploads.
- Qdrant collection is recreated on startup (for development); in production, change to load existing collection.
- PDF text extraction uses PyPDF2 (simple; no layout preservation).
- Reranking is optional — set
reranker.modelto empty string to disable. - No persistent storage cleanup — delete
db/indexor Qdrant data directory manually when resetting.
- Multi-file / folder ingestion
- Background task processing for large uploads
- Advanced chunking (overlap, semantic)
- Evaluation endpoints
- Authentication & rate limiting
- Qdrant Docker integration
- Gradio / Streamlit frontend