An end-to-end Retrieval-Augmented Generation (RAG) system built with n8n that transforms a Google Drive folder into a searchable AI-powered knowledge base. Ask questions in plain English and get accurate, document-grounded answers.
RAG stands for Retrieval-Augmented Generation. Instead of relying on an AI's general training data, RAG:
- Retrieves the most relevant chunks from your own documents
- Augments the user's question with that retrieved context
- Generates a precise answer grounded in your actual files
This means the chatbot answers from your documents, not from general AI knowledge.
Google Drive Folder
↓
n8n Ingestion Workflow
↓
Extract Text → AI Metadata Extraction → Chunking → Embeddings
↓
Qdrant Vector Database
↓
User Question → Embed Query → Retrieve Top Chunks
↓
Groq LLM (Llama 3.3 70B) → Grounded Answer
↓
Chat Interface + Google Sheets Log
| Layer | Tool |
|---|---|
| Workflow Orchestration | n8n |
| Document Storage | Google Drive |
| Vector Database | Qdrant |
| LLM — Generation & Metadata | Groq — Llama 3.3 70B |
| Embeddings | Hugging Face — sentence-transformers/all-MiniLM-L6-v2 |
| Logging | Google Sheets |
| Notifications | Gmail |
Triggered manually or on schedule. Processes documents from Google Drive and stores them in Qdrant.
Flow:
Manual Trigger
→ Config (folder ID, collection name, chunk settings)
→ Google Drive — List PDF files
→ Filter PDFs only
→ Loop Over Files (batch size: 1)
→ Download File
→ Extract Text from PDF
→ Normalize & Clean Text
→ Groq — Extract Metadata (title, summary, keywords, topics, risks)
→ Merge text + metadata
→ Flatten Metadata
→ Chunk Text (1200 tokens, 200 overlap)
→ Generate Embeddings (HuggingFace)
→ Store in Qdrant with metadata payload
→ Log to Google Sheets
→ Gmail — Send completion notification
Metadata extracted per document:
- Title
- Summary
- Main topics
- Keywords
- Document type
- Audience
- Important entities
- Action items
- Risks
- Dates
Runs every time a user sends a message via n8n's built-in chat UI.
Flow:
Chat Trigger
→ Master Agent (LangChain)
→ Simple Memory (last 30 messages)
→ Qdrant Vector Store Tool (Top-K: 8 chunks)
→ Groq Llama 3.3 70B
→ Return grounded answer
→ Log to Google Sheets
System behaviour:
- Always searches Qdrant before answering
- Mentions source file names in responses
- Says "I don't know" if answer is not in documents
- Maintains conversation memory across turns
rag_chatbot_agent/
├── README.md ← Project documentation
├── .gitignore ← Ignores secrets and env files
├── .env.example ← Required credentials template
└── workflows/
└── RAG-CHATBOT-AGENT.json ← n8n workflow (import this)
- n8n instance (cloud or self-hosted)
- Qdrant cluster (free tier available)
- Groq API key (free tier)
- Hugging Face API key (free)
- Google account (Drive + Sheets + Gmail)
In your Qdrant dashboard, create a collection with:
- Vector size: 384
- Distance: Cosine
Go to n8n → Settings → Credentials and add:
| Credential | Used For |
|---|---|
| Google Drive OAuth2 | Reading files |
| Google Sheets OAuth2 | Logging |
| Gmail OAuth2 | Notifications |
| Qdrant API | Vector storage |
| Groq API | LLM generation |
| Hugging Face API | Embeddings |
- Open your n8n instance
- Click New Workflow → ⋮ Menu → Import from file
- Select
workflows/RAG-CHATBOT-AGENT.json
Open the Edit Fields node and update:
folder_id → Your Google Drive folder ID
qdrant_collection → Your Qdrant collection name
qdrant_url → Your Qdrant cluster URL
- Add PDF files to your Google Drive folder
- Click Execute Workflow on the ingestion workflow
- Wait for the Gmail completion notification
- Verify vectors appear in your Qdrant dashboard
- Open the Chat trigger in n8n
- Click the chat icon to open the chat UI
- Ask questions about your documents
Copy .env.example to .env and fill in your values:
QDRANT_URL=YOUR_QDRANT_CLUSTER_URL
QDRANT_API_KEY=your_qdrant_api_key_here
GROQ_API_KEY=your_groq_api_key_here
HF_API_KEY=your_huggingface_api_key_here
GOOGLE_DRIVE_FOLDER_ID=your_google_drive_folder_id
QDRANT_COLLECTION=your_qdrant_collection_name
GOOGLE_SHEETS_ID=your_google_sheets_id
GMAIL_ADDRESS=your_gmail_address
N8N_INSTANCE_ID=your_n8n_instance_id
WEBHOOK_ID=your_webhook_idThe workflow automatically logs every indexed document to Google Sheets with:
| Column | Description |
|---|---|
| timestamp | When the file was indexed |
| file_id | Google Drive file ID |
| file_name | Name of the document |
| status | indexed |
| collection | Qdrant collection used |
| metadata | Extracted AI metadata |
| pageContent | Chunk text stored |
- RAG Architecture — retrieval-grounded answer generation
- Vector Database Design — Qdrant with rich metadata payloads
- Semantic Chunking — 1200 token chunks with 200 token overlap
- AI Metadata Extraction — structured enrichment using Groq
- LangChain Agent — tool-using agent with memory
- Workflow Automation — end-to-end orchestration in n8n
- Google Workspace Integration — Drive, Sheets, Gmail APIs
MIT License — free to use, modify, and distribute.
MAHADEVAN-007 GitHub: @MAHADEVAN-007