An intelligent oceanographic data analysis platform powered by AI, combining RAG (Retrieval-Augmented Generation) with natural language querying for ARGO float data.
- AI-Powered Chat Interface: Ask questions about ocean data in natural language
- Dual Query System:
- Semantic search using ChromaDB for descriptive queries
- NL-to-SQL translation for analytical queries
- Interactive Dashboard: Visualize ocean temperature, salinity, and depth data
- Multiple LLM Support: Works with local Ollama and cloud providers (Groq, OpenAI, OpenRouter)
- Real-time Data Processing: Process and analyze ARGO float measurements
- Export Capabilities: Export data in CSV, NetCDF, and ASCII formats
- Backend: FastAPI, Python 3.13+
- Frontend: Streamlit
- Database: PostgreSQL
- Vector Store: ChromaDB
- LLM: Ollama (local) / Groq / OpenAI / OpenRouter (free-tier models)
- Embeddings: nomic-embed-text / sentence-transformers
- Python 3.13+
- PostgreSQL
- Ollama (for local LLM)
- Git
git clone https://github.com/NematSachdeva/FloatChat-AI_107.git
cd FloatChat-AI_107/floatchat-aipython3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activatepip install -r requirements.txt# Create database
createdb argo
# Or using psql
psql -U postgres
CREATE DATABASE argo;
\q# Install Ollama (macOS/Linux)
curl -fsSL https://ollama.com/install.sh | sh
# Pull required models
ollama pull gemma2:2b
ollama pull nomic-embed-text:latestCopy the example environment file and configure it:
cp .env.example .envEdit .env with your settings:
# Database Configuration
DB_PASSWORD=
DATABASE_URL=postgresql+psycopg://your_username@localhost:5432/argo
# LLM Configuration
LLM_PROVIDER=ollama
OLLAMA_HOST=http://localhost:11434
LLM_MODEL=gemma2:2b
EMBEDDING_MODEL=nomic-embed-text:latest
# Free online model option (OpenRouter)
# LLM_PROVIDER=openrouter
# LLM_MODEL=qwen/qwen3-8b:free
# OPENROUTER_API_KEY=your_openrouter_key
# ChromaDB Configuration
CHROMA_PATH=./chroma_db
VECTOR_STORE=persistent
# Backend URL
BACKEND_URL=http://127.0.0.1:8000ollama servecd floatchat-ai
source venv/bin/activate
python3 -m uvicorn main:app --host 127.0.0.1 --port 8000 --reloadIn a new terminal:
cd floatchat-ai
source venv/bin/activate
streamlit run streamlit_app.py
## π Use Global Argo Dataset (Seanoe GDAC)
To ingest real global Argo profile data from DOI `10.17882/42182`:
```bash
python pipeline/ingest_seanoe_argo.py
python pipeline/data_chroma_floats.pyNotes:
pipeline/ingest_seanoe_argo.pyreads the GDAC profile index and ingests a sampled subset (ARGO_MAX_PROFILES) into PostgreSQL.- Increase
ARGO_MAX_PROFILESgradually as your DB/storage budget allows.
This repository now includes railway.json and Procfile for backend deployment.
Recommended setup on Railway:
- Create a backend service from this repo (uses
Procfile/railway.json). - Add a PostgreSQL plugin and set
DATABASE_URLfrom Railway. - Set environment variables:
LLM_PROVIDER,LLM_MODEL, provider API key,VECTOR_STORE=memory. - Deploy and verify
/health. - Create a second Railway service for Streamlit using start command:
streamlit run streamlit_app.py --server.address=0.0.0.0 --server.port=$PORT - Set frontend
BACKEND_URLto your backend Railway URL.
### Access the Application
- **Frontend**: http://localhost:8501
- **Backend API**: http://127.0.0.1:8000
- **API Docs**: http://127.0.0.1:8000/docs
## π Usage
### Chat Interface
1. Open the frontend at http://localhost:8501
2. Type your question in the chat interface
3. Examples:
- "What is ARGO?"
- "Show me temperature data by depth"
- "What are the average salinity measurements?"
### API Endpoints
#### Health Check
```bash
curl http://127.0.0.1:8000/health
curl -X POST http://127.0.0.1:8000/query \
-H "Content-Type: application/json" \
-d '{"query_text":"What is ARGO?"}'Run the test suite:
pytest tests/Run specific tests:
pytest tests/test_api_client.py
pytest tests/test_chat_interface.pyfloatchat-ai/
βββ main.py # FastAPI backend
βββ streamlit_app.py # Streamlit frontend
βββ config.py # Configuration management
βββ components/ # UI components
β βββ api_client.py
β βββ chat_interface.py
β βββ data_manager.py
β βββ ...
βββ tests/ # Test suite
βββ requirements.txt # Python dependencies
βββ .env.example # Environment template
βββ README.md # This file
- Never commit
.envfiles with sensitive data - Use environment variables for all credentials
- The
.gitignorefile excludes sensitive files automatically
- Ensure Ollama is running:
ollama serve - Check
.envfile has correctLLM_PROVIDER=ollama - Verify models are installed:
ollama list
- Delete and recreate:
rm -rf chroma_db/ - Restart backend to reinitialize
- Verify PostgreSQL is running
- Check database exists:
psql -l - Verify credentials in
.env
Contributions are welcome! Please feel free to submit a Pull Request.
This project is open source and available under the MIT License.
- Nemat Sachdeva - GitHub
- ARGO float data program
- Ollama for local LLM support
- ChromaDB for vector storage
- FastAPI and Streamlit communities
For questions or support, please open an issue on GitHub.
Made with β€οΈ for oceanographic research