Endee-Powered RAG System
Semantic Search & Grounded Question Answering using Endee Vector Database
🚀 Project Overview
This project implements a Retrieval-Augmented Generation (RAG) system powered by the Endee Vector Database.
The system enables:
-Document ingestion from a text file -Embedding generation using SentenceTransformers -Storage of embeddings in Endee -Semantic similarity search using cosine distance -Grounded answer generation using FLAN-T5 -Hallucination prevention using similarity thresholding
This project demonstrates a practical AI/ML application where vector search is the core component of the architecture.
🎯 Problem Statement
Traditional keyword-based search systems fail to understand semantic meaning.
Modern AI systems require:
Embedding-based similarity search
Context retrieval
Grounded answer generation
This project builds a minimal yet production-structured RAG system where:
Endee performs high-performance vector indexing and retrieval
A language model generates answers only from retrieved context
Irrelevant questions are safely rejected
🏗 System Architecture User Question ↓ SentenceTransformer (Embedding Model) ↓ Endee Vector Database (Cosine Similarity Search) ↓ Top-K Relevant Chunks ↓ Similarity Threshold Filtering (0.55) ↓ FLAN-T5 (LLM) ↓ Grounded Answer OR "I don't know" 🧠 Why Endee?
Endee is used as the primary vector database because it provides:
High-performance similarity search
Cosine distance indexing
Scalable architecture
Efficient C++ backend
Python SDK integration
Docker deployment support
In this project, Endee handles:
Index creation (dimension: 384)
Vector storage
Top-k similarity retrieval
📂** Project Structure** endee-rag-system/ │ ├── app/ │ ├── ingest.py # Load and chunk text file │ ├── embed.py # Generate embeddings and upsert into Endee │ ├── retrieve.py # Semantic retrieval using Endee SDK │ ├── rag.py # Full RAG pipeline (LLM + threshold) │ ├── data/ │ └── sample.txt # Knowledge base file │ ├── requirements.txt ├── README.md 🛡 Hallucination Prevention Strategy
To ensure reliability:
A similarity threshold of 0.55 is enforced.
If no retrieved vector crosses the threshold, the system returns:
I don't know
This guarantees:
No fabricated responses
No answers outside the dataset
Strictly grounded generation
⚙️ Setup Instructions: 1️⃣ Start Endee (Docker)
--Build Endee container:
docker build --build-arg BUILD_ARCH=avx2 -t endee-oss -f infra/Dockerfile .
--Run container:
docker run -d -p 8080:8080 --name endee-server endee-oss
--Verify health:
curl http://localhost:8080/api/v1/health 2️⃣ Create Index
Open:
Create index with:
Name: docs
Dimension: 384
Space Type: Cosine Similarity
3️⃣ Install Dependencies pip install -r requirements.txt 4️⃣ Embed & Store Documents python app/embed.py
This:
Loads the sample text file
Generates embeddings using all-MiniLM-L6-v2
Upserts vectors into Endee
5️⃣ Run RAG System python app/rag.py
Example:
Ask a question: What is Endee?
For unrelated questions:
Ask a question: What is psychology?
Expected Output:
I don't know
🧪 Technologies Used
Endee Vector Database
Endee Python SDK
SentenceTransformers (all-MiniLM-L6-v2)
HuggingFace Transformers
FLAN-T5 (google/flan-t5-small)
Docker 🔍 Endee Use Case in This Project
In this system, Endee serves as the core semantic retrieval engine powering the RAG pipeline. All document chunks are converted into 384-dimensional embeddings using SentenceTransformers and stored inside an Endee index configured with cosine similarity. When a user submits a question, the query is embedded and searched against the stored vectors using Endee’s high-performance similarity search. The top-k most relevant chunks are retrieved and passed to the language model for grounded response generation. By handling vector indexing, storage, and efficient nearest-neighbor retrieval, Endee enables fast, scalable semantic search and ensures that the generated answers are contextually accurate and strictly based on indexed knowledge. 🔍 Core Features
✔ Semantic similarity search ✔ Vector-based retrieval ✔ Grounded answer generation ✔ Similarity threshold enforcement ✔ Dockerized vector database ✔ Modular Python architecture
📌 Example Use Cases
This system can be extended for:
Domain-specific chatbots
Enterprise knowledge base search
Technical documentation assistants
Research assistants
Internal AI knowledge systems
🚧 Future Improvements
Hybrid search (dense + sparse)
Metadata-based filtering
PDF ingestion pipeline
REST API wrapper (FastAPI)
Web UI frontend
Model upgrade for stronger generation
🎤 Interview Explanation (If Asked)
Q: What happens if a user asks something outside the dataset?
Answer:
The system retrieves top-k vectors from Endee using cosine similarity. If no result exceeds the similarity threshold (0.55), the LLM is not invoked and the system returns "I don't know", preventing hallucination.
📄 License
This project is developed as part of an AI/ML evaluation using Endee Vector Database.
Endee itself is licensed under Apache 2.0.