AI-Powered Air Quality Chat with RAG Knowledge Base
A production-ready conversational AI application combining real-time air quality data, LLM reasoning, and Retrieval-Augmented Generation (RAG) with EPA & WHO guidelines.
Demo Date: October 25, 2025 β COMPLETE
Demo Video: πΊ Watch on YouTube Presentation: https://docs.google.com/presentation/d/1jMslCeHJ5zp_f2t-EBpZjMm6QB1-E_yEe9zszZsvDVs/edit?usp=sharing
| Phase | Feature | Status |
|---|---|---|
| 1-3 | Core Chat & Weather & Air Quality | β Complete |
| 4A | RAG Setup (ChromaDB + Embeddings) | β Complete |
| 4B | AI Agent Integration | β Complete |
| 4C | Frontend UI (Citations & Status) | β Complete |
| 5 | Production Ready | β Ready |
- React 18 + Vite + Tailwind CSS
- Components: ChatPane, CitationBubble, RAGStatus, MessageBubble
- Context: ChatContext (conversations), ThemeContext (dark/light mode)
- Port: 5174
- Node.js + Express (Session management & API proxy)
- Port: 3005
- Features: Rate limiting, error handling, session tracking
- Port: 8000
- AI Agent: LangChain ReAct with 5 tools
- RAG Pipeline: Local ChromaDB + Google Embeddings (FREE)
- Knowledge Base: EPA & WHO PDF guidelines (7 documents)
- OpenAQ API v3 - Real-time air quality measurements
- Nominatim/OpenStreetMap - Geocoding & location search
- ChromaDB (Local) - Vector storage for RAG (~10MB local)
β Real-time AQI display with EPA color coding β Location search with geocoding β NowCast PM2.5 calculations (EPA formula) β Multiple pollutant support (PM2.5, PM10, O3, NO2) β Conversational AI chat interface β Dark/light theme support β Message persistence (localStorage)
β Search Knowledge Base Tool - AI can search EPA/WHO guidelines β Citation Display - Sources shown with relevance scores [1] [2] [3] β RAG Status Indicator - Real-time knowledge base status in header β Expandable Citations - View document source, page, domain, score β 7 Documents Loaded:
- EPA Air Quality Guide (Particle Pollution)
- EPA AQI Technical Assistance Documents (2)
- WHO Global Air Quality Guidelines
- Technical reporting standards
- ChatGPT-style interface with streaming responses
- Citation bubbles with expandable metadata
- RAG status indicator (Ready/Loading/Offline)
- Responsive sidebar with conversation management
- Dark mode optimization for accessibility
- Mobile-first responsive design
AIrChat/
βββ frontend/
β βββ src/
β β βββ components/
β β β βββ ChatHeader.jsx (+ RAGStatus)
β β β βββ ChatPane.jsx
β β β βββ CitationBubble.jsx (NEW - Phase 4C)
β β β βββ MessageBubble.jsx (+ citations display)
β β β βββ MessageComposer.jsx
β β β βββ MessageList.jsx
β β β βββ RAGStatus.jsx (NEW - Phase 4C)
β β β βββ Sidebar.jsx
β β βββ contexts/
β β β βββ ChatContext.jsx
β β β βββ ThemeContext.jsx
β β βββ App.jsx
β βββ package.json
βββ backend/
β βββ server.js (+ GET /v1/rag/status endpoint)
β βββ session_manager.js
β βββ package.json
βββ svc/
β βββ main.py (FastAPI + endpoints)
β βββ ai_agent.py (5 tools including search_knowledge_base)
β βββ rag/
β β βββ __init__.py
β β βββ embeddings.py (Google/OpenAI models)
β β βββ vector_store.py (ChromaDB)
β β βββ document_loader.py (PDF chunking)
β β βββ retriever.py (MMR strategy)
β β βββ rag_chain.py (Main RAG chain)
β βββ data/
β β βββ kb/
β β βββ epa/ (3 PDFs)
β β βββ who/ (1 PDF)
β βββ store/
β β βββ chroma/airchat_vi_v1/ (Vector storage)
β βββ requirements.txt (RAG dependencies)
βββ README.md
cd frontend
npm install # First time only
npm run dev
# β
Runs on http://localhost:5174cd backend
npm install # First time only
npm run dev
# β
Runs on http://localhost:3005cd svc
python3 -m venv .venv # First time only
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt # First time only
.venv/bin/python3 -m uvicorn main:app --reload --port 8000
# β
Runs on http://localhost:8000Then open: http://localhost:5174 π
Location: /svc/rag/ (5 modules)
-
embeddings.py (85 lines)
- Google Embeddings:
text-embedding-004(FREE tier) β - OpenAI Optional:
text-embedding-3-small(paid) - Cost-aware model selection
- Google Embeddings:
-
vector_store.py (100 lines)
- ChromaDB 0.5.3 local database
- Storage:
/svc/store/chroma/airchat_vi_v1 - Persistence: Automatic saving
-
document_loader.py (115 lines)
- PDF loading from
/svc/data/kb/ - Chunking: 800 tokens, 120 overlap
- Auto-detection: EPA & WHO documents
- PDF loading from
-
retriever.py (155 lines)
- MMR (Maximal Marginal Relevance) strategy
- Similarity threshold: 0.12 (normalized)
- Metadata filtering by domain
-
rag_chain.py (110 lines)
- Combines retriever + LLM
- Citation formatting:
[1] [2] [3] - Context window management
Location: /svc/ai_agent.py
5 Tools Available:
1. get_air_quality # OpenAQ real-time data
2. get_location # Geocoding (Nominatim)
3. get_weather # Weather API
4. get_health_advice # Health recommendations
5. search_knowledge_base # RAG (NEW!) πFeatures:
- Per-session memory isolation (security fix)
- LangChain ReAct pattern
- Automatic tool selection based on query
- Citation tracking in responses
Location: /frontend/src/components/
New Components:
CitationBubble.jsx- Expandable citation display with [1] [2] [3] referencesRAGStatus.jsx- Status indicator (Ready/Loading/Offline)
Updated Components:
MessageBubble.jsx- Now displays citations below assistant messagesChatHeader.jsx- Integrated RAG status indicator
Backend Endpoint:
GET /v1/rag/status- Returns RAG availability and document count
| Metric | Value |
|---|---|
| Response Time | ~1.5-3 seconds |
| Embedding Model | Google (FREE tier) |
| Vector DB | ChromaDB (Local, ~10MB) |
| Documents | 7 (EPA + WHO) |
| Monthly Cost | $0 π |
| Scalability | Ready for cloud deployment |
All RAG components tested and working:
β
RAG Chain initialized successfully
β
Google Embeddings active (FREE tier)
β
ChromaDB loaded with 7 documents
β
AI agent has 5 tools available
β
Frontend citations display working
β
Backend /v1/rag/status endpoint responding
β
Per-session memory isolation implemented
β
All dependencies installed (chromadb, langchain-chroma, pypdf, etc.)
| Method | Endpoint | Purpose |
|---|---|---|
| POST | /v1/chat |
Send message (streaming) |
| GET | /v1/rag/status |
Get RAG status |
| Method | Endpoint | Purpose |
|---|---|---|
| GET | /health |
Service health status |
curl -X POST http://localhost:3005/v1/chat \
-H "Content-Type: application/json" \
-d '{
"message": "What are EPA guidelines for PM2.5?",
"sessionId": "user-123"
}'curl http://localhost:3005/v1/rag/statusResponse:
{
"status": "Ready",
"rag_available": true,
"documents_loaded": 7,
"score_threshold": 0.12,
"embedding_model": "google",
"timestamp": "2025-10-25T..."
}| Date | Phase | Focus | Status |
|---|---|---|---|
| Oct 11-18 | 1-3 | Core Chat & Air Quality | β Complete |
| Oct 19-24 | 4A | RAG Setup (ChromaDB + Embeddings) | β Complete |
| Oct 19-24 | 4B | AI Agent Integration (RAGTool) | β Complete |
| Oct 19-24 | 4C | Frontend UI (Citations + Status) | β Complete |
| Oct 25 | 5 | Demo & Production Ready | β Complete |
Phase 1-3: Foundation
- β React chat interface with streaming SSE
- β LangChain ReAct AI agent (4 tools)
- β Real-time air quality data (OpenAQ v3)
- β Location geocoding (Nominatim)
- β Dark mode & conversation persistence
Phase 4A: RAG Setup
- β ChromaDB local vector store (7 documents)
- β Google Embeddings (FREE tier)
- β PDF document loading with intelligent chunking
- β MMR retrieval strategy
- β Citation formatting & tracking
Phase 4B: Agent Integration
- β RAGTool added to agent (5th tool)
- β Automatic tool selection by LLM
- β System prompt updated with RAG guidance
- β Per-session memory isolation (security)
- β Lazy RAGChain initialization
Phase 4C: Frontend UI
- β CitationBubble component (expandable sources)
- β RAGStatus indicator (real-time status)
- β MessageBubble citations integration
- β ChatHeader RAG status display
- β Backend /v1/rag/status endpoint
- β Dark mode & responsive design
For detailed information, see:
- PHASE_C_COMPLETE.md - Phase 4C frontend UI details
- RAG_COMPLETE_SUMMARY.md - Complete RAG implementation overview
- TECH_STACK.md - Architecture & deployment information
- DAY1_COMPLETE.md - Phases 1-3 core features
- DAY2_COMPLETE.md - Phase 4 RAG implementation
- TEST_CASES.md - 40+ comprehensive test scenarios
- svc/rag/README.md - RAG pipeline technical documentation
- β Per-session memory isolation (no cross-user data leakage)
- β Session-based conversation scoping
- β No persistent user profiles (localStorage only)
- β Client-side message storage
- β Rate limiting on backend endpoints
- β Input sanitization & XSS protection
- β CORS properly configured
- β Environment variables for sensitive data
- β Local ChromaDB (no external server exposure)
- β Read-only document access
- β No sensitive data in PDFs
- β Version controlled knowledge base
- EPA Appendix G: 40 CFR Part 58
- NowCast Formula: EPA Technical Assistance Document
- Reference: https://www.ecfr.gov/current/title-40/chapter-I/subchapter-C/part-58/appendix-Appendix%20G%20to%20Part%2058
- PM2.5 24-hour: β€ 15 Β΅g/mΒ³
- PM2.5 Annual: β€ 5 Β΅g/mΒ³
- Reference: https://www.who.int/publications/i/item/9789240034228
- EPA Documents: Air quality guides, AQI technical assistance
- WHO Documents: Global air quality guidelines
- Coverage: Particulate matter, Oβ, NOβ standards
- OpenAQ: 100 requests/day (free tier)
- Nominatim: 1 request/second max, User-Agent required
- Google Embeddings: FREE tier included, no API key needed for demo
MIT - See LICENSE file for details
This application is for educational purposes. Air quality data is provided by OpenAQ and official sources. For health and safety decisions, always consult official EPA/WHO resources and local authorities. Do not rely solely on this application for emergency decision-making.
AIrChat is a complete, production-ready application combining:
- π Real-time air quality data (EPA standards)
- π¬ Conversational AI with LangChain
- π RAG knowledge base (EPA & WHO guidelines)
- π Citation tracking & source display
- π¨ Beautiful, responsive UI with dark mode
- π Zero-cost deployment (Google FREE embeddings + local storage)
Status: β COMPLETE & READY FOR DEMO!
Get started: Just run the 3 terminals above and visit http://localhost:5174
Built with β€οΈ for the WiBD Hackathon 2025