Semantic Search Engine (SSE)

A FastAPI-powered semantic search application with React frontend, featuring document uploads, vector embeddings, and LLM-powered question answering.

✨ Features

Document Processing: Upload PDFs, Word docs, and text files
Semantic Search: Find relevant content using vector embeddings
RAG Integration: Generate AI-powered answers from your documents
Session Management: Isolate document sets per session
Modern Stack: FastAPI + React + FAISS + HuggingFace models

Feature	Description
Multi-format Support	PDF, Word, Text files
Hybrid Search	Semantic + keyword search
RAG Integration	Context-aware answers
Performance	Optimized chunking & indexing

🛠️ Installation

Docker runner

docker compose up --build
# Accsess the app at http:localhost:8000

Development Setup

   # Linux / Mac
   python -m venv venv
   source venv/bin/activate 

   # Windows
   venv\Scripts\activate

   # run
   pip install -r requirements.txt
   uvicorn main:app --reload

Frontend

   cd frontend
   pnpm install
   pnpm run dev

API Endpoints

Endpoint	Method	Description
/api/session	GET	Create a new session
/api/upload	POST	Upload files and resource
/api/search	GET	Perform semantic search

How the app operates?

You Open the App

The app creates a unique session ID for you

Behind the scenes:
→ Generates a random ID
→ Creates an empty folder named after that ID to store your files.
You Upload Files

You drag/drop PDFs, Word docs, or text files into the app.

Behind the scenes:
→ Saves files to your session folder.
→ Breaks each file into small text chunks (e.g., 1-2 sentences each).
→ Stores these chunks in a list with metadata (file name, page number, etc.).
The App "Understands" Your Files

The app converts every text chunk into number sequences (vectors) using AI.

Behind the scenes:
→ Uses a pre-trained model (all-MiniLM-L6-v2) to generate vectors.
→ Builds a searchable index using FAISS (Facebook’s search tool).
You Search for Something

Behind the scenes:
→ Converts your query into a barcode (vector) using the same AI model.
→ Compares it against all document barcodes to find the closest matches.
→ Reranks results using a second AI (cross-encoder) to prioritize relevance.
You Get Results

The app shows you:
-> Direct excerpts from your documents (sorted by relevance).
-> AI-generated summary (if RAG is enabled), combining the top matches into a natural answer.

Simple Analogy

I. Session ID = Your private locker.

II. File Upload = Putting books into the locker.

III. Chunking = Tearing out pages and highlighting paragraphs.

IV. Vectors = Giving each paragraph a unique ID number.

V. Search = Finding paragraphs with matching ID numbers to your question.

VI. AI Answer = A friend (the LLM) reads those paragraphs and explains the answer to you.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
frontend		frontend
modules		modules
routes		routes
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
constants.py		constants.py
docker-compose.yml		docker-compose.yml
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Semantic Search Engine (SSE)

✨ Features

🛠️ Installation

Docker runner

Development Setup

Frontend

API Endpoints

How the app operates?

Simple Analogy

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Semantic Search Engine (SSE)

✨ Features

🛠️ Installation

Docker runner

Development Setup

Frontend

API Endpoints

How the app operates?

Simple Analogy

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages