Date: March 28, 2026 Purpose: Single source-of-truth handoff document for AI assistants and new contributors.
CloudSeer is a cloud cost intelligence platform that turns AWS telemetry into operations decisions.
Current end-to-end loop:
- Pull EC2 CPU from CloudWatch on a timer.
- Estimate cost per polling interval.
- Store time-series points in SQLite.
- Run anomaly detection and forecasting in API routes.
- Surface actions in a React dashboard.
- Trigger one-click remediation (stop EC2 instance).
CloudSeer is in an MVP-plus stage: technically integrated, demo-ready, and extensible, with known hardcoded constraints documented below.
CloudSeer is designed to move teams from reactive cloud cost review to proactive cloud cost control.
- Reactive mode: discover overruns late, investigate manually, fix slowly.
- CloudSeer mode: detect drift early, forecast risk, explain urgency, trigger action.
Core buyer value:
- Faster anomaly response.
- Less alert fatigue.
- Direct path from insight to remediation.
- Traceable savings narrative.
- FastAPI app with CORS enabled globally.
- Async startup polling loop in
backend/api/main.py. - Poll interval: 60 seconds.
- CloudWatch lookup window: last 15 minutes, period 300 seconds.
- SQLite store class:
TimeSeriesStoreinbackend/db/timeseries_store.py. - Cost estimator: fixed EC2 hourly rate in
backend/aws/cost_estimator.py. - API routes mounted under
/api:metricsforecastanomaliesremediate
- React + Vite app with route layout architecture.
- Routes:
//anomalies/automation/resources/:id/reports
- Data fetch layer in
frontend/src/api/client.js. - Dashboard refresh cadence: every 10 seconds.
- Recharts visualizations and Framer Motion animations throughout operator experience.
- Anomaly module:
ml/anomaly/isolation_forest.py. - Forecast module:
ml/forecasting/prophet_model.py. - Synthetic history seeding:
ml/forecasting/synthetic_history.py. - Preprocessor utilities:
ml/pipeline/data_preprocessor.py. - ML execution is integrated in live backend endpoints, not notebook-only.
AWS CloudWatch (EC2 CPU)
-> backend/api/main.py (poll_aws, every 60s)
-> backend/aws/cost_estimator.py (fixed-rate cost increment)
-> backend/db/timeseries_store.py (SQLite cloudseer.db)
-> backend/api/routes/*.py
/api/metrics
/api/forecast
/api/anomalies
/api/remediate
-> frontend/src/api/client.js
-> frontend routed pages and components
ML invocation path:
/api/anomalies->detect_anomalies(metrics)/api/forecast->train_and_forecast(historical_data, horizon_minutes=60)
Base URL: http://localhost:8000
Returns service status and exposed endpoint list.
Returns metrics grouped by (resource_id, resource_type):
{
"resources": [
{
"id": "i-0da3659219976da09",
"type": "ec2",
"metrics": [
{
"timestamp": "2026-03-28T10:01:00Z",
"cpu": 14.31,
"cost_usd": 0.000193,
"invocations": null
}
]
}
]
}Returns Prophet forecast and spike signal:
{
"resource_id": null,
"forecast": [
{
"timestamp": "2026-03-28T10:02:00Z",
"predicted_cost": 0.00021,
"lower": 0.00018,
"upper": 0.00024
}
],
"spike_warning": false,
"spike_at": null
}Returns anomaly-only items after ML filtering:
{
"anomalies": [
{
"resource_id": "i-0da3659219976da09",
"type": "idle_instance",
"confidence": 0.94,
"cost_impact_usd": 1.37,
"recommended_action": "stop_instance",
"auto_execute": false,
"claude_summary": "..."
}
]
}Current backend implementation safely accepts a payload representing the intent (resource_id, recommended_action), tracks the before_cost, derives after_cost logic, triggers the system fix, and logs an entry to the remediations SQLite table.
{
"success": true,
"before_cost": 0.0116,
"after_cost": 0.0,
"action_taken": "stop_instance",
"resource_id": "i-0da3659219976da09",
"status": "success",
"details": {
"status": "stopped",
"instance_id": "i-0da3659219976da09"
}
}Returns a historical log list table out of the internal database of all system-applied remediations to be queried by the dashboard tables.
- Hardcoded instance ID in backend poller and remediation route:
i-0da3659219976da09. - Frontend API base URL hardcoded to
http://localhost:8000. - Cost model uses static EC2 hourly rate (
0.0116) for demo simplicity. - Poll loop uses fallback CPU value when AWS metric datapoints are temporarily absent.
- Forecast route trains Prophet in-request (no persisted model cache yet).
- Anomalies route attempts Anthropic summary generation with 120-second in-memory cache; falls back to deterministic text on any exception.
SQLite tables:
metrics
Columns:
idINTEGER PRIMARY KEY AUTOINCREMENTtimestampTEXTresource_idTEXTresource_typeTEXTcpuREALcost_usdREALinvocationsREAL
Read pattern: ascending by timestamp with optional limit.
remediations
Columns:
idINTEGER PRIMARY KEY AUTOINCREMENTtimestampTEXTresource_idTEXTaction_takenTEXTstatusTEXTbefore_costREALafter_costREALdetailsTEXT
Backend (backend/requirements.txt):
- boto3
- fastapi
- uvicorn
- pydantic
- python-dotenv
ML (ml/requirements.txt):
- numpy==2.4.3
- pandas==3.0.1
- scikit-learn==1.8.0
- prophet==1.3.0
Frontend (frontend/package.json):
- React 18
- Vite 5
- react-router-dom 7.x
- Recharts
- Framer Motion
- Tailwind CSS
cd backend
pip install -r requirements.txt
uvicorn api.main:app --reload --port 8000cd frontend
npm install
npm run devcd ml
pip install -r requirements.txt- Single-resource focus: runtime flow is effectively one EC2 instance.
- No auth, tenancy, or RBAC boundary in API layer.
- Forecasting retrains on each request and may scale poorly under load.
- Reports page exists in routing but remains lightweight in operational depth.
- Externalize all hardcoded IDs/URLs to env and config.
- Introduce model caching and basic inference telemetry.
- Expand resource coverage beyond EC2 to Lambda/S3.
- Add authentication and org/tenant scoping.
Use this when handing context to Claude:
You are joining the CloudSeer codebase (AWS cost intelligence platform).
Read PROJECT_MASTER.md, AI_CONTEXT_BACKEND.md, AI_CONTEXT_FRONTEND.md, AI_CONTEXT_ML.md, and AI_CONTEXT_DESIGN.md.
Assume current date March 28, 2026.
Focus on code-accurate behavior, especially hardcoded instance remediation, API contracts, and ML route integration.
When suggesting changes, preserve existing endpoint response shapes unless explicitly asked to break compatibility.