RAG Integration

This document describes the ChromaDB/RAG integration in RAG Lab.

Current Status: Complete

The RAG integration is fully implemented. Users can:

Connect to ChromaDB databases (local, server, or cloud)
Browse and select collections
Configure retrieval settings (results count, distance threshold)
Toggle RAG on/off in the chat interface
See which documents were retrieved for each response

Architecture

Backend Services

chat_rag_explorer/rag_config_service.py - Singleton service (rag_config_service)

Manages ChromaDB connections in three modes:

Mode	Client	Use Case
`local`	`chromadb.PersistentClient(path=...)`	Direct file-based storage
`server`	`chromadb.HttpClient(host, port)`	Local ChromaDB server
`cloud`	`chromadb.CloudClient(tenant, database, api_key)`	ChromaDB Cloud service

Key methods:

get_client() - Returns configured ChromaDB client
list_collections() - Lists available collections
query_collection(query_text, n_results, distance_threshold) - Queries for relevant documents
get_sample_records(collection, limit) - Fetches sample documents for preview

chat_rag_explorer/routes.py - Chat integration

The /api/chat endpoint handles RAG integration:

If rag_enabled=true, queries ChromaDB with the user's message
Augments the user message with retrieved context using XML format
Includes RAG metadata in the response for UI display

Context Injection Format

When RAG retrieves documents, the user message is augmented with XML-formatted context:

<knowledge_base_context>
<document index="1">First retrieved document content...</document>
<document index="2">Second retrieved document content...</document>
</knowledge_base_context>

<original_user_message>
What is the user's original question?
</original_user_message>

This format:

Clearly separates context from the user's question
Uses indexed documents for clarity
Is visible in the "View Details" modal for transparency

Configuration Storage

rag_config.json - Project root

{
  "mode": "local",
  "local_path": "/path/to/chromadb",
  "server_host": "localhost",
  "server_port": 8000,
  "cloud_tenant": "",
  "cloud_database": "",
  "collection": "selected_collection_name",
  "n_results": 5,
  "distance_threshold": null
}

Field	Description
`mode`	Connection mode: `local`, `server`, or `cloud`
`collection`	Selected collection name for queries
`n_results`	Number of documents to retrieve (1-10)
`distance_threshold`	Max distance for results (`null` = no filtering)

For cloud mode, the API key is read from environment variable CHROMADB_API_KEY (not stored in config).

API Endpoints

All endpoints defined in chat_rag_explorer/routes.py:

Endpoint	Method	Description
`/api/rag/config`	GET	Get current RAG configuration
`/api/rag/config`	POST	Save RAG configuration
`/api/rag/validate-path`	POST	Validate local ChromaDB path exists
`/api/rag/test-connection`	POST	Test connection, returns collection list
`/api/rag/api-key-status`	GET	Check if `CHROMADB_API_KEY` is configured
`/api/rag/sample`	POST	Fetch sample records from a collection

Frontend

Settings Page

chat_rag_explorer/static/settings.js - RAG Settings tab

The Settings page (/settings#rag) provides a wizard-style interface:

Step 1: Configure - Select mode and enter connection details
- Local: Path input with validation
- Server: Host and port inputs
- Cloud: Tenant ID, database name, API key status
Step 2: Test Connection - Validates config and retrieves collection list
Step 3: Select Collection - Choose collection + configure retrieval settings
- Results Count slider (1-10)
- Distance Threshold slider (0 = off, up to 3.0)
Step 4: Save - Persists configuration to rag_config.json

Chat Interface

chat_rag_explorer/static/script.js - RAG toggle and display

The chat page includes:

RAG Toggle - Enable/disable RAG in the sidebar (links to settings if not configured)
Context Badge - Shows "Retrieved X document(s) from collection_name" above responses
View Details Modal - Shows the full augmented message sent to the LLM, including all retrieved documents

User Flow

Initial Setup

Navigate to Settings > RAG Settings tab (or click "RAG" link in chat sidebar)
Select connection mode (local/server/cloud)
Enter connection details
Click "Test Connection"
Select a collection from dropdown
Adjust retrieval settings (optional)
Click "Save Settings"

Using RAG in Chat

Enable the RAG toggle in the sidebar
Send a message - the system will:
- Query ChromaDB for relevant documents
- Inject context into your message
- Send augmented message to the LLM
See the badge showing how many documents were retrieved
Click "view details" to see exactly what was sent to the LLM

Local Path Validation

For local mode, the service validates:

Path exists
Path is a directory
Directory contains chroma.sqlite3 (ChromaDB marker file)

Sample Data

A pre-built ChromaDB with 195 chunks from "The Morn Chronicles" (a Star Trek DS9 fan fiction, 28 chapters) is included in the repository. On first startup, the app automatically copies the pristine sample from data/chroma_db_sample/ to data/chroma_db/ (which is gitignored) to prevent git deltas from ChromaDB's internal file mutations.

To use it:

Go to Settings > RAG Settings
Select "Local" mode
Enter path: data/chroma_db (relative paths work)
Test connection and select the collection
Save and enable RAG in chat

Appendix: Deep Dive

ChromaDB Data Layout

Understanding how ChromaDB stores data helps explain why multiple collections can coexist in a single database path.

PersistentClient Directory Structure

When you use chromadb.PersistentClient(path="data/chroma_db"), ChromaDB creates this structure:

data/chroma_db/
├── chroma.sqlite3                          # Shared metadata database
├── 2a31d927-ff2a-4dbf-b30f-094e5e91b702/  # Collection 1 vector data
│   ├── data_level0.bin
│   ├── header.bin
│   ├── length.bin
│   └── link_lists.bin
└── fbe357dd-b35e-4646-86c2-f71862b696f9/  # Collection 2 vector data
    ├── data_level0.bin
    ├── header.bin
    ├── length.bin
    └── link_lists.bin

What Each Component Does

Component	Purpose
`chroma.sqlite3`	SQLite database storing collection metadata, document IDs, and text content for ALL collections in this path
UUID directories	HNSW index files for vector similarity search, one directory per collection
`data_level0.bin`	The actual vector embeddings
`header.bin`, `length.bin`, `link_lists.bin`	HNSW graph structure for fast approximate nearest neighbor search

Key Insight: Shared Database

The chroma.sqlite3 file is shared across all collections in that path. This means:

Sample DB + Ingested Data Coexist: When the app copies the sample database on startup, it brings its chroma.sqlite3 and collection folder. When you run utils/ingest.py, it opens the same chroma.sqlite3 and adds new collections alongside the existing ones.
Single Connection Point: The app only needs one path (data/chroma_db) to access all collections - both the sample "Morn Chronicles" and any documents you ingest.
Why We Copy the Sample: ChromaDB mutates chroma.sqlite3 even during read operations (for internal bookkeeping). By copying data/chroma_db_sample/ to data/chroma_db/ on startup, we keep the committed sample pristine while allowing the working copy to be modified freely.

How Ingestion Works

The utils/ingest.py script (line 39, 535-536):

RAG_DB_FILE_PATH = Path(__file__).parent.parent / "data" / "chroma_db"
# ...
client = PersistentClient(path=str(RAG_DB_FILE_PATH))
collection = client.get_or_create_collection(name=collection_name)

This creates or opens data/chroma_db/chroma.sqlite3 and adds the new collection. The collection name follows the pattern {corpus}-{chunk_size}chunk-{overlap}overlap (e.g., morn-chronicles-256chunk-50overlap).

Listing All Collections

To see all collections in a database:

import chromadb

client = chromadb.PersistentClient(path="data/chroma_db")
for col in client.list_collections():
    print(f"{col.name}: {col.count()} documents")

Or use the RAG Settings UI - after testing the connection, the collection dropdown shows all available collections.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RAG Integration

Current Status: Complete

Architecture

Backend Services

Context Injection Format

Configuration Storage

API Endpoints

Frontend

Settings Page

Chat Interface

User Flow

Initial Setup

Using RAG in Chat

Local Path Validation

Sample Data

Appendix: Deep Dive

ChromaDB Data Layout

PersistentClient Directory Structure

What Each Component Does

Key Insight: Shared Database

How Ingestion Works

Listing All Collections

FilesExpand file tree

RAG.md

Latest commit

History

RAG.md

File metadata and controls

RAG Integration

Current Status: Complete

Architecture

Backend Services

Context Injection Format

Configuration Storage

API Endpoints

Frontend

Settings Page

Chat Interface

User Flow

Initial Setup

Using RAG in Chat

Local Path Validation

Sample Data

Appendix: Deep Dive

ChromaDB Data Layout

PersistentClient Directory Structure

What Each Component Does

Key Insight: Shared Database

How Ingestion Works

Listing All Collections