GitHub - Nalini-24/endee_rag_system: Semantic Search + Question Answering RAG system using Endee Vector DB

Endee-Powered RAG System

Semantic Search & Grounded Question Answering using Endee Vector Database

🚀 Project Overview

This project implements a Retrieval-Augmented Generation (RAG) system powered by the Endee Vector Database.

The system enables:

-Document ingestion from a text file -Embedding generation using SentenceTransformers -Storage of embeddings in Endee -Semantic similarity search using cosine distance -Grounded answer generation using FLAN-T5 -Hallucination prevention using similarity thresholding

This project demonstrates a practical AI/ML application where vector search is the core component of the architecture.

🎯 Problem Statement

Traditional keyword-based search systems fail to understand semantic meaning.

Modern AI systems require:

Embedding-based similarity search

Context retrieval

Grounded answer generation

This project builds a minimal yet production-structured RAG system where:

Endee performs high-performance vector indexing and retrieval

A language model generates answers only from retrieved context

Irrelevant questions are safely rejected

🏗 System Architecture User Question ↓ SentenceTransformer (Embedding Model) ↓ Endee Vector Database (Cosine Similarity Search) ↓ Top-K Relevant Chunks ↓ Similarity Threshold Filtering (0.55) ↓ FLAN-T5 (LLM) ↓ Grounded Answer OR "I don't know" 🧠 Why Endee?

Endee is used as the primary vector database because it provides:

High-performance similarity search

Cosine distance indexing

Scalable architecture

Efficient C++ backend

Python SDK integration

Docker deployment support

In this project, Endee handles:

Index creation (dimension: 384)

Vector storage

Top-k similarity retrieval

📂** Project Structure** endee-rag-system/ │ ├── app/ │ ├── ingest.py # Load and chunk text file │ ├── embed.py # Generate embeddings and upsert into Endee │ ├── retrieve.py # Semantic retrieval using Endee SDK │ ├── rag.py # Full RAG pipeline (LLM + threshold) │ ├── data/ │ └── sample.txt # Knowledge base file │ ├── requirements.txt ├── README.md 🛡 Hallucination Prevention Strategy

To ensure reliability:

A similarity threshold of 0.55 is enforced.

If no retrieved vector crosses the threshold, the system returns:

I don't know

This guarantees:

No fabricated responses

No answers outside the dataset

Strictly grounded generation

⚙️ Setup Instructions: 1️⃣ Start Endee (Docker)

--Build Endee container:

docker build --build-arg BUILD_ARCH=avx2 -t endee-oss -f infra/Dockerfile .

--Run container:

docker run -d -p 8080:8080 --name endee-server endee-oss

--Verify health:

curl http://localhost:8080/api/v1/health 2️⃣ Create Index

Open:

http://localhost:8080

Create index with:

Name: docs

Dimension: 384

Space Type: Cosine Similarity

3️⃣ Install Dependencies pip install -r requirements.txt 4️⃣ Embed & Store Documents python app/embed.py

This:

Loads the sample text file

Generates embeddings using all-MiniLM-L6-v2

Upserts vectors into Endee

5️⃣ Run RAG System python app/rag.py

Example:

Ask a question: What is Endee?

For unrelated questions:

Ask a question: What is psychology?

Expected Output:

I don't know

🧪 Technologies Used

Endee Vector Database

Endee Python SDK

SentenceTransformers (all-MiniLM-L6-v2)

HuggingFace Transformers

FLAN-T5 (google/flan-t5-small)

Docker 🔍 Endee Use Case in This Project

In this system, Endee serves as the core semantic retrieval engine powering the RAG pipeline. All document chunks are converted into 384-dimensional embeddings using SentenceTransformers and stored inside an Endee index configured with cosine similarity. When a user submits a question, the query is embedded and searched against the stored vectors using Endee’s high-performance similarity search. The top-k most relevant chunks are retrieved and passed to the language model for grounded response generation. By handling vector indexing, storage, and efficient nearest-neighbor retrieval, Endee enables fast, scalable semantic search and ensures that the generated answers are contextually accurate and strictly based on indexed knowledge. 🔍 Core Features

✔ Semantic similarity search ✔ Vector-based retrieval ✔ Grounded answer generation ✔ Similarity threshold enforcement ✔ Dockerized vector database ✔ Modular Python architecture

📌 Example Use Cases

This system can be extended for:

Domain-specific chatbots

Enterprise knowledge base search

Technical documentation assistants

Research assistants

Internal AI knowledge systems

🚧 Future Improvements

Hybrid search (dense + sparse)

Metadata-based filtering

PDF ingestion pipeline

REST API wrapper (FastAPI)

Web UI frontend

Model upgrade for stronger generation

🎤 Interview Explanation (If Asked)

Q: What happens if a user asks something outside the dataset?

Answer:

The system retrieves top-k vectors from Endee using cosine similarity. If no result exceeds the similarity threshold (0.55), the LLM is not invoked and the system returns "I don't know", preventing hallucination.

📄 License

This project is developed as part of an AI/ML evaluation using Endee Vector Database.

Endee itself is licensed under Apache 2.0.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
__pycache__		__pycache__
app		app
data		data
README.md		README.md
requirements.txt		requirements.txt
test_run.py		test_run.py

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages