Skip to content

ngtri1809/AIrChat

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

43 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

AIrChat πŸŒπŸ’¨

AI-Powered Air Quality Chat with RAG Knowledge Base

A production-ready conversational AI application combining real-time air quality data, LLM reasoning, and Retrieval-Augmented Generation (RAG) with EPA & WHO guidelines.

Demo Date: October 25, 2025 βœ… COMPLETE

Demo Video: πŸ“Ί Watch on YouTube Presentation: https://docs.google.com/presentation/d/1jMslCeHJ5zp_f2t-EBpZjMm6QB1-E_yEe9zszZsvDVs/edit?usp=sharing

🎯 Project Status

Phase Feature Status
1-3 Core Chat & Weather & Air Quality βœ… Complete
4A RAG Setup (ChromaDB + Embeddings) βœ… Complete
4B AI Agent Integration βœ… Complete
4C Frontend UI (Citations & Status) βœ… Complete
5 Production Ready βœ… Ready

Tech Stack

Frontend

  • React 18 + Vite + Tailwind CSS
  • Components: ChatPane, CitationBubble, RAGStatus, MessageBubble
  • Context: ChatContext (conversations), ThemeContext (dark/light mode)
  • Port: 5174

Backend Gateway

  • Node.js + Express (Session management & API proxy)
  • Port: 3005
  • Features: Rate limiting, error handling, session tracking

Python Service (FastAPI)

  • Port: 8000
  • AI Agent: LangChain ReAct with 5 tools
  • RAG Pipeline: Local ChromaDB + Google Embeddings (FREE)
  • Knowledge Base: EPA & WHO PDF guidelines (7 documents)

Data Sources

  • OpenAQ API v3 - Real-time air quality measurements
  • Nominatim/OpenStreetMap - Geocoding & location search
  • ChromaDB (Local) - Vector storage for RAG (~10MB local)

πŸš€ Key Features

Phase 1-3: Core Chat & Air Quality

βœ… Real-time AQI display with EPA color coding βœ… Location search with geocoding βœ… NowCast PM2.5 calculations (EPA formula) βœ… Multiple pollutant support (PM2.5, PM10, O3, NO2) βœ… Conversational AI chat interface βœ… Dark/light theme support βœ… Message persistence (localStorage)

Phase 4: RAG Knowledge Base Integration

βœ… Search Knowledge Base Tool - AI can search EPA/WHO guidelines βœ… Citation Display - Sources shown with relevance scores [1] [2] [3] βœ… RAG Status Indicator - Real-time knowledge base status in header βœ… Expandable Citations - View document source, page, domain, score βœ… 7 Documents Loaded:

  • EPA Air Quality Guide (Particle Pollution)
  • EPA AQI Technical Assistance Documents (2)
  • WHO Global Air Quality Guidelines
  • Technical reporting standards

🎨 UI Features

  • ChatGPT-style interface with streaming responses
  • Citation bubbles with expandable metadata
  • RAG status indicator (Ready/Loading/Offline)
  • Responsive sidebar with conversation management
  • Dark mode optimization for accessibility
  • Mobile-first responsive design

Project Structure

AIrChat/
β”œβ”€β”€ frontend/
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ components/
β”‚   β”‚   β”‚   β”œβ”€β”€ ChatHeader.jsx (+ RAGStatus)
β”‚   β”‚   β”‚   β”œβ”€β”€ ChatPane.jsx
β”‚   β”‚   β”‚   β”œβ”€β”€ CitationBubble.jsx (NEW - Phase 4C)
β”‚   β”‚   β”‚   β”œβ”€β”€ MessageBubble.jsx (+ citations display)
β”‚   β”‚   β”‚   β”œβ”€β”€ MessageComposer.jsx
β”‚   β”‚   β”‚   β”œβ”€β”€ MessageList.jsx
β”‚   β”‚   β”‚   β”œβ”€β”€ RAGStatus.jsx (NEW - Phase 4C)
β”‚   β”‚   β”‚   └── Sidebar.jsx
β”‚   β”‚   β”œβ”€β”€ contexts/
β”‚   β”‚   β”‚   β”œβ”€β”€ ChatContext.jsx
β”‚   β”‚   β”‚   └── ThemeContext.jsx
β”‚   β”‚   └── App.jsx
β”‚   └── package.json
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ server.js (+ GET /v1/rag/status endpoint)
β”‚   β”œβ”€β”€ session_manager.js
β”‚   └── package.json
β”œβ”€β”€ svc/
β”‚   β”œβ”€β”€ main.py (FastAPI + endpoints)
β”‚   β”œβ”€β”€ ai_agent.py (5 tools including search_knowledge_base)
β”‚   β”œβ”€β”€ rag/
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”œβ”€β”€ embeddings.py (Google/OpenAI models)
β”‚   β”‚   β”œβ”€β”€ vector_store.py (ChromaDB)
β”‚   β”‚   β”œβ”€β”€ document_loader.py (PDF chunking)
β”‚   β”‚   β”œβ”€β”€ retriever.py (MMR strategy)
β”‚   β”‚   └── rag_chain.py (Main RAG chain)
β”‚   β”œβ”€β”€ data/
β”‚   β”‚   └── kb/
β”‚   β”‚       β”œβ”€β”€ epa/ (3 PDFs)
β”‚   β”‚       └── who/ (1 PDF)
β”‚   β”œβ”€β”€ store/
β”‚   β”‚   └── chroma/airchat_vi_v1/ (Vector storage)
β”‚   └── requirements.txt (RAG dependencies)
└── README.md

⚑ Quick Start (3 Terminals)

Terminal 1: Frontend (React + Vite)

cd frontend
npm install  # First time only
npm run dev
# βœ… Runs on http://localhost:5174

Terminal 2: Backend Gateway (Express)

cd backend
npm install  # First time only
npm run dev
# βœ… Runs on http://localhost:3005

Terminal 3: Python Service (FastAPI + RAG)

cd svc
python3 -m venv .venv  # First time only
source .venv/bin/activate  # Windows: .venv\Scripts\activate
pip install -r requirements.txt  # First time only
.venv/bin/python3 -m uvicorn main:app --reload --port 8000
# βœ… Runs on http://localhost:8000

Then open: http://localhost:5174 πŸŽ‰

πŸ”§ RAG Pipeline Architecture

Phase 4A: RAG Setup

Location: /svc/rag/ (5 modules)

  1. embeddings.py (85 lines)

    • Google Embeddings: text-embedding-004 (FREE tier) βœ…
    • OpenAI Optional: text-embedding-3-small (paid)
    • Cost-aware model selection
  2. vector_store.py (100 lines)

    • ChromaDB 0.5.3 local database
    • Storage: /svc/store/chroma/airchat_vi_v1
    • Persistence: Automatic saving
  3. document_loader.py (115 lines)

    • PDF loading from /svc/data/kb/
    • Chunking: 800 tokens, 120 overlap
    • Auto-detection: EPA & WHO documents
  4. retriever.py (155 lines)

    • MMR (Maximal Marginal Relevance) strategy
    • Similarity threshold: 0.12 (normalized)
    • Metadata filtering by domain
  5. rag_chain.py (110 lines)

    • Combines retriever + LLM
    • Citation formatting: [1] [2] [3]
    • Context window management

Phase 4B: AI Agent Integration

Location: /svc/ai_agent.py

5 Tools Available:

1. get_air_quality      # OpenAQ real-time data
2. get_location         # Geocoding (Nominatim)
3. get_weather          # Weather API
4. get_health_advice    # Health recommendations
5. search_knowledge_base # RAG (NEW!) πŸ†•

Features:

  • Per-session memory isolation (security fix)
  • LangChain ReAct pattern
  • Automatic tool selection based on query
  • Citation tracking in responses

Phase 4C: Frontend UI

Location: /frontend/src/components/

New Components:

  • CitationBubble.jsx - Expandable citation display with [1] [2] [3] references
  • RAGStatus.jsx - Status indicator (Ready/Loading/Offline)

Updated Components:

  • MessageBubble.jsx - Now displays citations below assistant messages
  • ChatHeader.jsx - Integrated RAG status indicator

Backend Endpoint:

  • GET /v1/rag/status - Returns RAG availability and document count

πŸ“Š Performance & Cost

Metric Value
Response Time ~1.5-3 seconds
Embedding Model Google (FREE tier)
Vector DB ChromaDB (Local, ~10MB)
Documents 7 (EPA + WHO)
Monthly Cost $0 πŸŽ‰
Scalability Ready for cloud deployment

πŸ§ͺ Testing & Validation

All RAG components tested and working:

βœ… RAG Chain initialized successfully
βœ… Google Embeddings active (FREE tier)
βœ… ChromaDB loaded with 7 documents
βœ… AI agent has 5 tools available
βœ… Frontend citations display working
βœ… Backend /v1/rag/status endpoint responding
βœ… Per-session memory isolation implemented
βœ… All dependencies installed (chromadb, langchain-chroma, pypdf, etc.)

πŸ“‘ API Endpoints

Chat Endpoints

Method Endpoint Purpose
POST /v1/chat Send message (streaming)
GET /v1/rag/status Get RAG status

Health Check

Method Endpoint Purpose
GET /health Service health status

Example: Chat with RAG

curl -X POST http://localhost:3005/v1/chat \
  -H "Content-Type: application/json" \
  -d '{
    "message": "What are EPA guidelines for PM2.5?",
    "sessionId": "user-123"
  }'

Example: Check RAG Status

curl http://localhost:3005/v1/rag/status

Response:

{
  "status": "Ready",
  "rag_available": true,
  "documents_loaded": 7,
  "score_threshold": 0.12,
  "embedding_model": "google",
  "timestamp": "2025-10-25T..."
}

πŸŽ“ Development Timeline & Phases

Date Phase Focus Status
Oct 11-18 1-3 Core Chat & Air Quality βœ… Complete
Oct 19-24 4A RAG Setup (ChromaDB + Embeddings) βœ… Complete
Oct 19-24 4B AI Agent Integration (RAGTool) βœ… Complete
Oct 19-24 4C Frontend UI (Citations + Status) βœ… Complete
Oct 25 5 Demo & Production Ready βœ… Complete

Phase Highlights

Phase 1-3: Foundation

  • βœ… React chat interface with streaming SSE
  • βœ… LangChain ReAct AI agent (4 tools)
  • βœ… Real-time air quality data (OpenAQ v3)
  • βœ… Location geocoding (Nominatim)
  • βœ… Dark mode & conversation persistence

Phase 4A: RAG Setup

  • βœ… ChromaDB local vector store (7 documents)
  • βœ… Google Embeddings (FREE tier)
  • βœ… PDF document loading with intelligent chunking
  • βœ… MMR retrieval strategy
  • βœ… Citation formatting & tracking

Phase 4B: Agent Integration

  • βœ… RAGTool added to agent (5th tool)
  • βœ… Automatic tool selection by LLM
  • βœ… System prompt updated with RAG guidance
  • βœ… Per-session memory isolation (security)
  • βœ… Lazy RAGChain initialization

Phase 4C: Frontend UI

  • βœ… CitationBubble component (expandable sources)
  • βœ… RAGStatus indicator (real-time status)
  • βœ… MessageBubble citations integration
  • βœ… ChatHeader RAG status display
  • βœ… Backend /v1/rag/status endpoint
  • βœ… Dark mode & responsive design

πŸ“š Documentation

For detailed information, see:

πŸ” Security Considerations

Data Privacy

  • βœ… Per-session memory isolation (no cross-user data leakage)
  • βœ… Session-based conversation scoping
  • βœ… No persistent user profiles (localStorage only)
  • βœ… Client-side message storage

API Security

  • βœ… Rate limiting on backend endpoints
  • βœ… Input sanitization & XSS protection
  • βœ… CORS properly configured
  • βœ… Environment variables for sensitive data

Knowledge Base Security

  • βœ… Local ChromaDB (no external server exposure)
  • βœ… Read-only document access
  • βœ… No sensitive data in PDFs
  • βœ… Version controlled knowledge base

🎨 Standards & References

AQI Calculation

WHO Guidelines

RAG Knowledge Base

  • EPA Documents: Air quality guides, AQI technical assistance
  • WHO Documents: Global air quality guidelines
  • Coverage: Particulate matter, O₃, NOβ‚‚ standards

API Usage Policies

  • OpenAQ: 100 requests/day (free tier)
  • Nominatim: 1 request/second max, User-Agent required
  • Google Embeddings: FREE tier included, no API key needed for demo

οΏ½ License

MIT - See LICENSE file for details

⚠️ Disclaimer

This application is for educational purposes. Air quality data is provided by OpenAQ and official sources. For health and safety decisions, always consult official EPA/WHO resources and local authorities. Do not rely solely on this application for emergency decision-making.


🎊 Summary

AIrChat is a complete, production-ready application combining:

  • 🌍 Real-time air quality data (EPA standards)
  • πŸ’¬ Conversational AI with LangChain
  • πŸ“š RAG knowledge base (EPA & WHO guidelines)
  • πŸ” Citation tracking & source display
  • 🎨 Beautiful, responsive UI with dark mode
  • πŸš€ Zero-cost deployment (Google FREE embeddings + local storage)

Status: βœ… COMPLETE & READY FOR DEMO!

Get started: Just run the 3 terminals above and visit http://localhost:5174


Built with ❀️ for the WiBD Hackathon 2025

About

This is a hackathon project for WiBD about weather AI chat box

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors