NPMAI-RAG-API-Pipeline

A powerful FastAPI-based multi-modal ingestion system that processes PDFs, scanned documents, images, videos, YouTube links, and text files — then optionally performs semantic retrieval using FAISS + HuggingFace embeddings and refines answers using an LLM (Ollama via NPMAI).

🚀 Features

📄 Extract text from searchable PDFs
🖨️ OCR for scanned PDFs
🖼️ Image OCR (Tesseract + OpenCV preprocessing)
🎥 Local video speech-to-text (Whisper)
📺 YouTube video transcription (yt-dlp + Whisper)
📃 Plain text processing
🧠 FAISS vector database creation & loading
🔎 Semantic similarity search
♻️ Iterative refinement using LLM (Ollama)
🗂 Automatic ingestion routing based on file type
Supabse Storage for persistent storage.

To understand repo project with AI in detail with full documentation visit here:-

Workflow:-

IMPORTANT UPDATE IN NPMAI RAG API PIPELINE:-

🚀 NPMAI Update: Advanced RAG & Refine Architecture

We have officially upgraded the NPMAI Ecosystem to a more intelligent, cost-efficient, and "Product-Ready" pipeline. These updates move beyond basic RAG into High-Performance Agentic Retrieval.

🔍 1. Dynamic K-Context Retrieval (70% Coverage)

The Problem: Standard RAG systems use a fixed k value (e.g., k=4). This is inefficient—it provides too little context for large documents (missing facts) and too much "noise" for tiny documents (wasting tokens).

The Solution: I have engineered a Proportional Scaling Logic that calculates the optimal number of chunks to retrieve based on the actual density of your vectorized database.

Logic: dynamic_k = max(1, int(total_chunks * 0.70))
How it works:
- Short Documents: If your database has only 2 chunks, the system retrieves only those 2.
- Large PDFs: If your PDF generates 100 chunks, the system automatically scales up to retrieve 70 relevant chunks ($k=70$).
The Impact: This ensures the AI always sees a statistically significant slice of the knowledge base, adapting perfectly to any document size.

🔄 2. Sliding Window Batch-Refinement (3-Chunk Window)

The Problem: Traditional "Refine" strategies process one chunk at a time. This is incredibly slow because it makes $N$ separate API calls. For a 30-chunk document, the user waits too long.

The Solution: I have implemented a Sliding Window Batch-Refine system that processes chunks in groups of 3 instead of 1.

Logic: for i in range(0, total_chunks, 3):
How it works:
- Instead of making a single LLM call for every 1,000 characters, the system sends a batch of 3 related chunks (3,000 characters) in one go.
- It uses the previous answer as a "Running Memory" to merge new information from the current 3-chunk batch.
The Impact:
- 3x Faster Execution: We have reduced total API latency by 66%.
- Improved Coherence: The AI sees a broader context ($3,000$ chars vs $1,000$ chars), allowing it to spot connections between facts that are split across neighboring chunks.

☁️ 3. Infrastructure: Persistent Supabase Integration (v0.1.8)

We have successfully integrated Supabase Object Storage to move from temporary memory to Persistent Knowledge Bases.

Vector Persistence: All .faiss and .pkl index files are now automatically uploaded to a secure Supabase bucket.
Multi-Platform Access: This allows NPM-Rag-AI, NPM-AutoCode-AI, and the npmai SDK to share and load the same vectorized data from anywhere in the world.

Summary: These architectural changes make NPMAI one of the most efficient open-source RAG frameworks available for developers who need Speed + Accuracy without the high cost of standard 1-by-1 refinement.

🏗 Architecture Overview

Client Request
      ↓
/ingestion Endpoint
      ↓
File Type Detection
      ↓
Text Extraction (PDF/OCR/Video/etc.)
      ↓
Optional Vector DB Retrieval (FAISS)
      ↓
Refinement via LLM
      ↓
Final Response

📌 API Endpoints

Health Check

GET /

Returns:

{ "ok": true }

Main Ingestion Endpoint

POST /ingestion

Supported Inputs:

file → Upload file (pdf, txt, mp4, jpg, png, etc.)
query → Optional semantic query
DB_PATH → Path to vector database
link → YouTube link
output_path → Download location for video
temperature → LLM temperature
model → Ollama model name

📂 Supported File Types

Type	Processing Method
PDF (text-based)	PyMuPDF
PDF (scanned)	pdf2image + Tesseract
Image	OpenCV + Tesseract
TXT	Direct read
MP4	Whisper transcription
YouTube	yt-dlp + Whisper

🔍 Retrieval Pipeline

If query and DB_PATH are provided:

Check if FAISS DB exists
If yes → Load and perform similarity search
If no → Create embeddings & save DB
Retrieve top 4 chunks
Send to LLM refine loop

🧠 Vector Store

Embeddings: all-MiniLM-L6-v2
Vector DB: FAISS
Chunk Size: 1000
Overlap: 200

🔄 Refinement Logic

For each retrieved chunk:

Pass context to LLM
Iteratively refine previous answer
Return final refined response

📦 Dependencies

Install required packages:

pip install fastapi uvicorn
pip install langchain langchain-community
pip install faiss-cpu
pip install whisper
pip install moviepy
pip install pytesseract
pip install pdf2image
pip install pymupdf
pip install yt-dlp
pip install opencv-python
pip install pillow
pip install numpy

Make sure:

Tesseract OCR is installed in system
FFmpeg is installed
Ollama is running locally

▶️ Running the Server

uvicorn main:app --reload

🧩 Example Usage

Upload a PDF with Retrieval

POST /ingestion
Form Data:
file = document.pdf
query = "Summarize key points"
DB_PATH = vector_db
model = llama3
temperature = 0.7

⚠️ Notes

GPU is disabled (CUDA_VISIBLE_DEVICES="")
Whisper model loads once (thread-safe singleton)
FAISS uses dangerous deserialization (use trusted DB paths only)
Temporary audio saved as temp.wav

🔮 Future Improvements

Streaming responses
Async video processing
Chunk-level caching
Background task queue
Better refine logic
Support for multiple vector stores
Use through ##Docker

🛠 Tech Stack

FastAPI
FAISS
HuggingFace Embeddings
Whisper
OpenCV
Tesseract OCR
PyMuPDF
yt-dlp
##npmai

📜 License

MIT License

💡 Summary

This system acts as a universal AI ingestion pipeline capable of processing multi-modal data and performing intelligent semantic retrieval with LLM refinement.

It can serve as:

AI document assistant
Video summarizer
Research helper
OCR intelligence engine
Knowledge base system

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
entrypoint.sh		entrypoint.sh
npmai-rag-api-framework.py		npmai-rag-api-framework.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NPMAI-RAG-API-Pipeline

🚀 Features

To understand repo project with AI in detail with full documentation visit here:-

Workflow:-

IMPORTANT UPDATE IN NPMAI RAG API PIPELINE:-

🚀 NPMAI Update: Advanced RAG & Refine Architecture

🔍 1. Dynamic K-Context Retrieval (70% Coverage)

🔄 2. Sliding Window Batch-Refinement (3-Chunk Window)

☁️ 3. Infrastructure: Persistent Supabase Integration (v0.1.8)

🏗 Architecture Overview

📌 API Endpoints

Health Check

Main Ingestion Endpoint

Supported Inputs:

📂 Supported File Types

🔍 Retrieval Pipeline

🧠 Vector Store

🔄 Refinement Logic

📦 Dependencies

▶️ Running the Server

🧩 Example Usage

Upload a PDF with Retrieval

⚠️ Notes

🔮 Future Improvements

🛠 Tech Stack

📜 License

💡 Summary

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NPMAI-RAG-API-Pipeline

🚀 Features

To understand repo project with AI in detail with full documentation visit here:-

Workflow:-

IMPORTANT UPDATE IN NPMAI RAG API PIPELINE:-

🚀 NPMAI Update: Advanced RAG & Refine Architecture

🔍 1. Dynamic K-Context Retrieval (70% Coverage)

🔄 2. Sliding Window Batch-Refinement (3-Chunk Window)

☁️ 3. Infrastructure: Persistent Supabase Integration (v0.1.8)

🏗 Architecture Overview

📌 API Endpoints

Health Check

Main Ingestion Endpoint

Supported Inputs:

📂 Supported File Types

🔍 Retrieval Pipeline

🧠 Vector Store

🔄 Refinement Logic

📦 Dependencies

▶️ Running the Server

🧩 Example Usage

Upload a PDF with Retrieval

⚠️ Notes

🔮 Future Improvements

🛠 Tech Stack

📜 License

💡 Summary

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages