GitHub - akashrajeev/RAGNet

Multimodal Offline RAG (RAGNet)

An offline multimodal Retrieval-Augmented Generation system that ingests PDFs/DOCX, images, and audio; indexes them in shared vector spaces; and answers natural-language queries with grounded citations.

Features

Offline embeddings and LLM (no network required after first model downloads)
Modalities: Text (PDF/DOCX), Images (PNG/JPG), Audio (WAV/MP3 via offline STT)
Dual vector spaces: CLIP (cross-modal) and Sentence-Transformers (text)
FAISS indexes on disk + SQLite metadata with cross-links and citations
Streamlit UI for ingestion and chat/search with expandable sources

Project Structure

src/
  config.py
  ingest/
    pdf_docx.py
    images.py
    audio.py
  processing/
    chunking.py
  embeddings/
    clip_embedder.py
    text_embedder.py
  index/
    faiss_index.py
  store/
    metadata_db.py
  retrieval/
    retriever.py
  generation/
    llm.py
app.py
requirements.txt

Quickstart

Create and activate a virtual environment (Windows PowerShell):

python -m venv .venv
. .venv/Scripts/Activate.ps1
pip install --upgrade pip
pip install -r requirements.txt

Download offline models:

CLIP: ViT-B-32 via open_clip (auto-downloaded on first use; you can pre-download by first running the app online once, then stay offline)
Text encoder: sentence-transformers/all-MiniLM-L6-v2 (auto-download on first run). To pre-download offline, manually place all-MiniLM-L6-v2 into ~/.cache/torch/sentence_transformers/
Vosk STT model: download a small model, e.g., vosk-model-small-en-us-0.15 and set VOSK_MODEL_PATH in .env or src/config.py
LLM: Download a GGUF model compatible with llama-cpp-python (e.g., TheBloke/Llama-2-7B-GGUF q4_0). Set LLM_MODEL_PATH in .env or src/config.py.

Run the app:

streamlit run app.py

Ingest data:

Use the Ingest panel to add folders or files (PDF, DOCX, PNG/JPG, WAV/MP3)
The system extracts text, generates embeddings, and builds FAISS indexes

Query:

Type a natural-language question. Results fuse text and image/audio-derived context
Click citations to open source snippets, transcript segments, or images

Configuration

Edit defaults in src/config.py or via environment variables:

DATA_DIR, INDEX_DIR, DB_PATH
LLM_MODEL_PATH, VOSK_MODEL_PATH
DEVICE (cpu/cuda)

Notes

First run may download model weights. After that, the app works fully offline.
Audio ingestion converts to WAV mono 16kHz and uses Vosk for STT.
- For Python 3.13 on Windows, provide WAV mono 16kHz files directly. MP3 conversion is not included.
Cross-modal retrieval uses CLIP space for text↔image/audio (via transcript) and fuses with text space for pure text retrieval.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
src		src
README.md		README.md
app.py		app.py
citation_manager.py		citation_manager.py
cross_modal_fusion.py		cross_modal_fusion.py
main.py		main.py
real_time_processor.py		real_time_processor.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multimodal Offline RAG (RAGNet)

Features

Project Structure

Quickstart

Configuration

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Multimodal Offline RAG (RAGNet)

Features

Project Structure

Quickstart

Configuration

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages