Skip to content

NematSachdeva/FloatChat-AI_107

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

43 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

FloatChat AI 🌊

An intelligent oceanographic data analysis platform powered by AI, combining RAG (Retrieval-Augmented Generation) with natural language querying for ARGO float data.

πŸš€ Features

  • AI-Powered Chat Interface: Ask questions about ocean data in natural language
  • Dual Query System:
    • Semantic search using ChromaDB for descriptive queries
    • NL-to-SQL translation for analytical queries
  • Interactive Dashboard: Visualize ocean temperature, salinity, and depth data
  • Multiple LLM Support: Works with local Ollama and cloud providers (Groq, OpenAI, OpenRouter)
  • Real-time Data Processing: Process and analyze ARGO float measurements
  • Export Capabilities: Export data in CSV, NetCDF, and ASCII formats

πŸ› οΈ Tech Stack

  • Backend: FastAPI, Python 3.13+
  • Frontend: Streamlit
  • Database: PostgreSQL
  • Vector Store: ChromaDB
  • LLM: Ollama (local) / Groq / OpenAI / OpenRouter (free-tier models)
  • Embeddings: nomic-embed-text / sentence-transformers

πŸ“‹ Prerequisites

  • Python 3.13+
  • PostgreSQL
  • Ollama (for local LLM)
  • Git

πŸ”§ Installation

1. Clone the Repository

git clone https://github.com/NematSachdeva/FloatChat-AI_107.git
cd FloatChat-AI_107/floatchat-ai

2. Create Virtual Environment

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

3. Install Dependencies

pip install -r requirements.txt

4. Setup PostgreSQL Database

# Create database
createdb argo

# Or using psql
psql -U postgres
CREATE DATABASE argo;
\q

5. Install and Setup Ollama

# Install Ollama (macOS/Linux)
curl -fsSL https://ollama.com/install.sh | sh

# Pull required models
ollama pull gemma2:2b
ollama pull nomic-embed-text:latest

6. Configure Environment Variables

Copy the example environment file and configure it:

cp .env.example .env

Edit .env with your settings:

# Database Configuration
DB_PASSWORD=
DATABASE_URL=postgresql+psycopg://your_username@localhost:5432/argo

# LLM Configuration
LLM_PROVIDER=ollama
OLLAMA_HOST=http://localhost:11434
LLM_MODEL=gemma2:2b
EMBEDDING_MODEL=nomic-embed-text:latest

# Free online model option (OpenRouter)
# LLM_PROVIDER=openrouter
# LLM_MODEL=qwen/qwen3-8b:free
# OPENROUTER_API_KEY=your_openrouter_key

# ChromaDB Configuration
CHROMA_PATH=./chroma_db
VECTOR_STORE=persistent

# Backend URL
BACKEND_URL=http://127.0.0.1:8000

πŸš€ Running the Application

Start Ollama Service

ollama serve

Start Backend (FastAPI)

cd floatchat-ai
source venv/bin/activate
python3 -m uvicorn main:app --host 127.0.0.1 --port 8000 --reload

Start Frontend (Streamlit)

In a new terminal:

cd floatchat-ai
source venv/bin/activate
streamlit run streamlit_app.py

## 🌍 Use Global Argo Dataset (Seanoe GDAC)

To ingest real global Argo profile data from DOI `10.17882/42182`:

```bash
python pipeline/ingest_seanoe_argo.py
python pipeline/data_chroma_floats.py

Notes:

  • pipeline/ingest_seanoe_argo.py reads the GDAC profile index and ingests a sampled subset (ARGO_MAX_PROFILES) into PostgreSQL.
  • Increase ARGO_MAX_PROFILES gradually as your DB/storage budget allows.

πŸš‚ Railway Deployment

This repository now includes railway.json and Procfile for backend deployment.

Recommended setup on Railway:

  1. Create a backend service from this repo (uses Procfile/railway.json).
  2. Add a PostgreSQL plugin and set DATABASE_URL from Railway.
  3. Set environment variables: LLM_PROVIDER, LLM_MODEL, provider API key, VECTOR_STORE=memory.
  4. Deploy and verify /health.
  5. Create a second Railway service for Streamlit using start command: streamlit run streamlit_app.py --server.address=0.0.0.0 --server.port=$PORT
  6. Set frontend BACKEND_URL to your backend Railway URL.

### Access the Application

- **Frontend**: http://localhost:8501
- **Backend API**: http://127.0.0.1:8000
- **API Docs**: http://127.0.0.1:8000/docs

## πŸ“Š Usage

### Chat Interface

1. Open the frontend at http://localhost:8501
2. Type your question in the chat interface
3. Examples:
   - "What is ARGO?"
   - "Show me temperature data by depth"
   - "What are the average salinity measurements?"

### API Endpoints

#### Health Check
```bash
curl http://127.0.0.1:8000/health

Query Endpoint

curl -X POST http://127.0.0.1:8000/query \
  -H "Content-Type: application/json" \
  -d '{"query_text":"What is ARGO?"}'

πŸ§ͺ Testing

Run the test suite:

pytest tests/

Run specific tests:

pytest tests/test_api_client.py
pytest tests/test_chat_interface.py

πŸ“ Project Structure

floatchat-ai/
β”œβ”€β”€ main.py                 # FastAPI backend
β”œβ”€β”€ streamlit_app.py        # Streamlit frontend
β”œβ”€β”€ config.py               # Configuration management
β”œβ”€β”€ components/             # UI components
β”‚   β”œβ”€β”€ api_client.py
β”‚   β”œβ”€β”€ chat_interface.py
β”‚   β”œβ”€β”€ data_manager.py
β”‚   └── ...
β”œβ”€β”€ tests/                  # Test suite
β”œβ”€β”€ requirements.txt        # Python dependencies
β”œβ”€β”€ .env.example           # Environment template
└── README.md              # This file

πŸ” Security

  • Never commit .env files with sensitive data
  • Use environment variables for all credentials
  • The .gitignore file excludes sensitive files automatically

πŸ› Troubleshoot

Backend Returns 500 Error

  • Ensure Ollama is running: ollama serve
  • Check .env file has correct LLM_PROVIDER=ollama
  • Verify models are installed: ollama list

ChromaDB Collection Error

  • Delete and recreate: rm -rf chroma_db/
  • Restart backend to reinitialize

Database Connection Error

  • Verify PostgreSQL is running
  • Check database exists: psql -l
  • Verify credentials in .env

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

πŸ“ License

This project is open source and available under the MIT License.

πŸ‘₯ Authors

πŸ™ Acknowledgments

  • ARGO float data program
  • Ollama for local LLM support
  • ChromaDB for vector storage
  • FastAPI and Streamlit communities

πŸ“§ Contact

For questions or support, please open an issue on GitHub.


Made with ❀️ for oceanographic research

About

AI-powered oceanographic data analysis platform with RAG pipeline for ARGO float data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages