CloudSeer - Master Project Context

Date: March 28, 2026 Purpose: Single source-of-truth handoff document for AI assistants and new contributors.

1) Executive Summary

CloudSeer is a cloud cost intelligence platform that turns AWS telemetry into operations decisions.

Current end-to-end loop:

Pull EC2 CPU from CloudWatch on a timer.
Estimate cost per polling interval.
Store time-series points in SQLite.
Run anomaly detection and forecasting in API routes.
Surface actions in a React dashboard.
Trigger one-click remediation (stop EC2 instance).

CloudSeer is in an MVP-plus stage: technically integrated, demo-ready, and extensible, with known hardcoded constraints documented below.

2) Product Positioning

CloudSeer is designed to move teams from reactive cloud cost review to proactive cloud cost control.

Reactive mode: discover overruns late, investigate manually, fix slowly.
CloudSeer mode: detect drift early, forecast risk, explain urgency, trigger action.

Core buyer value:

Faster anomaly response.
Less alert fatigue.
Direct path from insight to remediation.
Traceable savings narrative.

3) What Exists in Code Right Now

Backend

FastAPI app with CORS enabled globally.
Async startup polling loop in backend/api/main.py.
Poll interval: 60 seconds.
CloudWatch lookup window: last 15 minutes, period 300 seconds.
SQLite store class: TimeSeriesStore in backend/db/timeseries_store.py.
Cost estimator: fixed EC2 hourly rate in backend/aws/cost_estimator.py.
API routes mounted under /api:
- metrics
- forecast
- anomalies
- remediate

Frontend

React + Vite app with route layout architecture.
Routes:
- /
- /anomalies
- /automation
- /resources/:id
- /reports
Data fetch layer in frontend/src/api/client.js.
Dashboard refresh cadence: every 10 seconds.
Recharts visualizations and Framer Motion animations throughout operator experience.

ML

Anomaly module: ml/anomaly/isolation_forest.py.
Forecast module: ml/forecasting/prophet_model.py.
Synthetic history seeding: ml/forecasting/synthetic_history.py.
Preprocessor utilities: ml/pipeline/data_preprocessor.py.
ML execution is integrated in live backend endpoints, not notebook-only.

4) Architecture (Runtime Data Path)

AWS CloudWatch (EC2 CPU)
  -> backend/api/main.py (poll_aws, every 60s)
  -> backend/aws/cost_estimator.py (fixed-rate cost increment)
  -> backend/db/timeseries_store.py (SQLite cloudseer.db)
  -> backend/api/routes/*.py
      /api/metrics
      /api/forecast
      /api/anomalies
      /api/remediate
  -> frontend/src/api/client.js
  -> frontend routed pages and components

ML invocation path:

/api/anomalies -> detect_anomalies(metrics)
/api/forecast -> train_and_forecast(historical_data, horizon_minutes=60)

5) API Contract Snapshot

Base URL: http://localhost:8000

GET `/`

Returns service status and exposed endpoint list.

GET `/api/metrics`

Returns metrics grouped by (resource_id, resource_type):

{
  "resources": [
    {
      "id": "i-0da3659219976da09",
      "type": "ec2",
      "metrics": [
        {
          "timestamp": "2026-03-28T10:01:00Z",
          "cpu": 14.31,
          "cost_usd": 0.000193,
          "invocations": null
        }
      ]
    }
  ]
}

GET `/api/forecast?resource_id=<optional>`

Returns Prophet forecast and spike signal:

{
  "resource_id": null,
  "forecast": [
    {
      "timestamp": "2026-03-28T10:02:00Z",
      "predicted_cost": 0.00021,
      "lower": 0.00018,
      "upper": 0.00024
    }
  ],
  "spike_warning": false,
  "spike_at": null
}

GET `/api/anomalies`

Returns anomaly-only items after ML filtering:

{
  "anomalies": [
    {
      "resource_id": "i-0da3659219976da09",
      "type": "idle_instance",
      "confidence": 0.94,
      "cost_impact_usd": 1.37,
      "recommended_action": "stop_instance",
      "auto_execute": false,
      "claude_summary": "..."
    }
  ]
}

POST `/api/remediate`

Current backend implementation safely accepts a payload representing the intent (resource_id, recommended_action), tracks the before_cost, derives after_cost logic, triggers the system fix, and logs an entry to the remediations SQLite table.

{
  "success": true,
  "before_cost": 0.0116,
  "after_cost": 0.0,
  "action_taken": "stop_instance",
  "resource_id": "i-0da3659219976da09",
  "status": "success",
  "details": {
    "status": "stopped",
    "instance_id": "i-0da3659219976da09"
  }
}

GET `/api/remediations`

Returns a historical log list table out of the internal database of all system-applied remediations to be queried by the dashboard tables.

6) Important Hardcoded Values and Behavioral Constraints

Hardcoded instance ID in backend poller and remediation route: i-0da3659219976da09.
Frontend API base URL hardcoded to http://localhost:8000.
Cost model uses static EC2 hourly rate (0.0116) for demo simplicity.
Poll loop uses fallback CPU value when AWS metric datapoints are temporarily absent.
Forecast route trains Prophet in-request (no persisted model cache yet).
Anomalies route attempts Anthropic summary generation with 120-second in-memory cache; falls back to deterministic text on any exception.

7) Data Model

SQLite tables:

metrics

Columns:

id INTEGER PRIMARY KEY AUTOINCREMENT
timestamp TEXT
resource_id TEXT
resource_type TEXT
cpu REAL
cost_usd REAL
invocations REAL

Read pattern: ascending by timestamp with optional limit.

remediations

Columns:

id INTEGER PRIMARY KEY AUTOINCREMENT
timestamp TEXT
resource_id TEXT
action_taken TEXT
status TEXT
before_cost REAL
after_cost REAL
details TEXT

8) Stack and Versions

Backend (backend/requirements.txt):

boto3
fastapi
uvicorn
pydantic
python-dotenv

ML (ml/requirements.txt):

numpy==2.4.3
pandas==3.0.1
scikit-learn==1.8.0
prophet==1.3.0

Frontend (frontend/package.json):

React 18
Vite 5
react-router-dom 7.x
Recharts
Framer Motion
Tailwind CSS

9) Runbook

Backend

cd backend
pip install -r requirements.txt
uvicorn api.main:app --reload --port 8000

Frontend

cd frontend
npm install
npm run dev

ML Environment

cd ml
pip install -r requirements.txt

10) Known Gaps and Risks

Single-resource focus: runtime flow is effectively one EC2 instance.
No auth, tenancy, or RBAC boundary in API layer.
Forecasting retrains on each request and may scale poorly under load.
Reports page exists in routing but remains lightweight in operational depth.

11) Near-Term Execution Priorities

Externalize all hardcoded IDs/URLs to env and config.
Introduce model caching and basic inference telemetry.
Expand resource coverage beyond EC2 to Lambda/S3.
Add authentication and org/tenant scoping.

12) Claude Prompt Starter (Copy/Paste)

Use this when handing context to Claude:

You are joining the CloudSeer codebase (AWS cost intelligence platform).
Read PROJECT_MASTER.md, AI_CONTEXT_BACKEND.md, AI_CONTEXT_FRONTEND.md, AI_CONTEXT_ML.md, and AI_CONTEXT_DESIGN.md.
Assume current date March 28, 2026.
Focus on code-accurate behavior, especially hardcoded instance remediation, API contracts, and ML route integration.
When suggesting changes, preserve existing endpoint response shapes unless explicitly asked to break compatibility.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CloudSeer - Master Project Context

1) Executive Summary

2) Product Positioning

3) What Exists in Code Right Now

Backend

Frontend

ML

4) Architecture (Runtime Data Path)

5) API Contract Snapshot

GET `/`

GET `/api/metrics`

GET `/api/forecast?resource_id=<optional>`

GET `/api/anomalies`

POST `/api/remediate`

GET `/api/remediations`

6) Important Hardcoded Values and Behavioral Constraints

7) Data Model

8) Stack and Versions

9) Runbook

Backend

Frontend

ML Environment

10) Known Gaps and Risks

11) Near-Term Execution Priorities

12) Claude Prompt Starter (Copy/Paste)

FilesExpand file tree

PROJECT_MASTER.md

Latest commit

History

PROJECT_MASTER.md

File metadata and controls

CloudSeer - Master Project Context

1) Executive Summary

2) Product Positioning

3) What Exists in Code Right Now

Backend

Frontend

ML

4) Architecture (Runtime Data Path)

5) API Contract Snapshot

GET /

GET /api/metrics

GET /api/forecast?resource_id=<optional>

GET /api/anomalies

POST /api/remediate

GET /api/remediations

6) Important Hardcoded Values and Behavioral Constraints

7) Data Model

8) Stack and Versions

9) Runbook

Backend

Frontend

ML Environment

10) Known Gaps and Risks

11) Near-Term Execution Priorities

12) Claude Prompt Starter (Copy/Paste)

GET `/`

GET `/api/metrics`

GET `/api/forecast?resource_id=<optional>`

GET `/api/anomalies`

POST `/api/remediate`

GET `/api/remediations`