Skip to content

Nalini-24/endee_rag_system

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Endee-Powered RAG System

Semantic Search & Grounded Question Answering using Endee Vector Database

🚀 Project Overview

This project implements a Retrieval-Augmented Generation (RAG) system powered by the Endee Vector Database.

The system enables:

-Document ingestion from a text file -Embedding generation using SentenceTransformers -Storage of embeddings in Endee -Semantic similarity search using cosine distance -Grounded answer generation using FLAN-T5 -Hallucination prevention using similarity thresholding

This project demonstrates a practical AI/ML application where vector search is the core component of the architecture.

🎯 Problem Statement

Traditional keyword-based search systems fail to understand semantic meaning.

Modern AI systems require:

Embedding-based similarity search

Context retrieval

Grounded answer generation

This project builds a minimal yet production-structured RAG system where:

Endee performs high-performance vector indexing and retrieval

A language model generates answers only from retrieved context

Irrelevant questions are safely rejected

🏗 System Architecture User Question ↓ SentenceTransformer (Embedding Model) ↓ Endee Vector Database (Cosine Similarity Search) ↓ Top-K Relevant Chunks ↓ Similarity Threshold Filtering (0.55) ↓ FLAN-T5 (LLM) ↓ Grounded Answer OR "I don't know" 🧠 Why Endee?

Endee is used as the primary vector database because it provides:

High-performance similarity search

Cosine distance indexing

Scalable architecture

Efficient C++ backend

Python SDK integration

Docker deployment support

In this project, Endee handles:

Index creation (dimension: 384)

Vector storage

Top-k similarity retrieval

📂** Project Structure** endee-rag-system/ │ ├── app/ │ ├── ingest.py # Load and chunk text file │ ├── embed.py # Generate embeddings and upsert into Endee │ ├── retrieve.py # Semantic retrieval using Endee SDK │ ├── rag.py # Full RAG pipeline (LLM + threshold) │ ├── data/ │ └── sample.txt # Knowledge base file │ ├── requirements.txt ├── README.md 🛡 Hallucination Prevention Strategy

To ensure reliability:

A similarity threshold of 0.55 is enforced.

If no retrieved vector crosses the threshold, the system returns:

I don't know

This guarantees:

No fabricated responses

No answers outside the dataset

Strictly grounded generation

⚙️ Setup Instructions: 1️⃣ Start Endee (Docker)

--Build Endee container:

docker build --build-arg BUILD_ARCH=avx2 -t endee-oss -f infra/Dockerfile .

--Run container:

docker run -d -p 8080:8080 --name endee-server endee-oss

--Verify health:

curl http://localhost:8080/api/v1/health 2️⃣ Create Index

Open:

http://localhost:8080

Create index with:

Name: docs

Dimension: 384

Space Type: Cosine Similarity

3️⃣ Install Dependencies pip install -r requirements.txt 4️⃣ Embed & Store Documents python app/embed.py

This:

Loads the sample text file

Generates embeddings using all-MiniLM-L6-v2

Upserts vectors into Endee

5️⃣ Run RAG System python app/rag.py

Example:

Ask a question: What is Endee?

For unrelated questions:

Ask a question: What is psychology?

Expected Output:

I don't know

🧪 Technologies Used

Endee Vector Database

Endee Python SDK

SentenceTransformers (all-MiniLM-L6-v2)

HuggingFace Transformers

FLAN-T5 (google/flan-t5-small)

Docker 🔍 Endee Use Case in This Project

In this system, Endee serves as the core semantic retrieval engine powering the RAG pipeline. All document chunks are converted into 384-dimensional embeddings using SentenceTransformers and stored inside an Endee index configured with cosine similarity. When a user submits a question, the query is embedded and searched against the stored vectors using Endee’s high-performance similarity search. The top-k most relevant chunks are retrieved and passed to the language model for grounded response generation. By handling vector indexing, storage, and efficient nearest-neighbor retrieval, Endee enables fast, scalable semantic search and ensures that the generated answers are contextually accurate and strictly based on indexed knowledge. 🔍 Core Features

✔ Semantic similarity search ✔ Vector-based retrieval ✔ Grounded answer generation ✔ Similarity threshold enforcement ✔ Dockerized vector database ✔ Modular Python architecture

📌 Example Use Cases

This system can be extended for:

Domain-specific chatbots

Enterprise knowledge base search

Technical documentation assistants

Research assistants

Internal AI knowledge systems

🚧 Future Improvements

Hybrid search (dense + sparse)

Metadata-based filtering

PDF ingestion pipeline

REST API wrapper (FastAPI)

Web UI frontend

Model upgrade for stronger generation

🎤 Interview Explanation (If Asked)

Q: What happens if a user asks something outside the dataset?

Answer:

The system retrieves top-k vectors from Endee using cosine similarity. If no result exceeds the similarity threshold (0.55), the LLM is not invoked and the system returns "I don't know", preventing hallucination.

📄 License

This project is developed as part of an AI/ML evaluation using Endee Vector Database.

Endee itself is licensed under Apache 2.0.

About

Semantic Search + Question Answering RAG system using Endee Vector DB

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages