RAG Playground

A hands-on sandbox for experimenting with Retrieval-Augmented Generation (RAG) workflows using modern LLM, embeddings, and vector databases.

🛠 Features

The repository covers essential components of a RAG pipeline:

1. Document Ingestion & Processing

Load, preprocess and chunk documents.
Powered by Chroma DB for fast similarity search and retrieval.
Stores document embeddings with associated metadata.

2. Embedding Generation

Uses all-MiniLM-L6-v2 sentence transformer for efficient embeddings.
Supports experimentation with bag-of-words text representation.
TSNE visualization of embeddings for exploratory analysis.
BM25 - keyword based ranking function is implemented

3. Retrieval & LLM Integration

Retrieval-augmented responses using gpt-4o-mini.
Modular setup to experiment with different retrieval strategies.
- Keyword Overlap
- Cosine Distance [Semantics]
- Hybrid approach [Keyword + Semantics]
- Context Summarizer

4. main_agent_driver.py

Serves as the entry point to the RAG Playground, handling user queries and coordinating the retrieval and generation workflow.
Retrieves relevant document chunks (via embeddings) and generates answers using the LLM.

⚙️ Configuration

Before running the project, create a config.ini in the root directory:

[keys]
openrouter_api_key = [YOUR_OPENROUTER_KEY]

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
embedding		embedding
ingestion		ingestion
rag		rag
.gitignore		.gitignore
README.md		README.md
main_agent_driver.py		main_agent_driver.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG Playground

🛠 Features

1. Document Ingestion & Processing

2. Embedding Generation

3. Retrieval & LLM Integration

4. main_agent_driver.py

⚙️ Configuration

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RAG Playground

🛠 Features

1. Document Ingestion & Processing

2. Embedding Generation

3. Retrieval & LLM Integration

4. main_agent_driver.py

⚙️ Configuration

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages