A full-stack Kubernetes deviation management platform with cluster and application lifecycle tracking, brownfield deviation analysis, and AI-powered chat assistant.
📐 Architecture Overview (text version)
┌─────────────────────────────────────────────────────────────────────┐
│ Browser (User) │
│ http://localhost:3000 │
└──────────┬──────────────────┬───────────────────┬───────────────────┘
│ │ │
┌──────▼──────┐ ┌──────▼──────┐ ┌──────▼──────┐
│ 🖥️ Clusters │ │ 📦 Apps │ │ 🤖 Chat │
│ Tab │ │ Tab │ │ Panel │
├─────────────┤ ├─────────────┤ │ │
│ Greenfield │ │ Greenfield │ │ Quick-action│
│ (Deploy) │ │ (Deploy) │ │ buttons │
├─────────────┤ ├─────────────┤ │ │
│ Brownfield │ │ Brownfield │ │ Context- │
│ (Deviations)│ │ (Scan/Fix) │ │ aware AI │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
└──────────┬───────┘ │
│ │
┌───────▼───────────────────────────▼───────┐
│ FastAPI Backend (Port 8000) │
│ /api/greenfield/* /api/brownfield/* │
│ /api/apps/* /api/chat │
│ ┌───────────┐ │
│ Loads .env ────► │ .env file │ │
│ (API keys) │ (gitignored)│ │
└───┬────────┬──────┴────────┬───┴──────────┘
│ │ │
┌─────────▼──┐ ┌──▼─────────┐ ┌───▼──────────┐
│ Artifact │ │ Deployment │ │ Deviation │
│ MCP :8765 │ │ MCP :8766 │ │ MCP :8767 │
│ (generate/ │ │ (deploy/ │ │ (analyze/ │
│ deploy) │ │ list/fix) │ │ scan) │
└─────┬──────┘ └──┬─────────┘ └───┬──────────┘
│ │ │
▼ ▼ ▼
┌──────────────────────────────────────────┐
│ releases.py (Loader) │
│ ├── release/cluster_release.json │
│ │ (R1=1.27, R2=1.28, R3=1.29, R4=1.30)│
│ └── release/application_release.json │
│ (nginx, httpd, memcached per release)│
└──────────────────────────────────────────┘
│ │ │
▼ ▼ ▼
┌──────────────────────────────────────────┐
│ Kind Clusters (Docker) │
│ ┌────────┐ ┌────────┐ ┌────────┐ │
│ │ c1 │ │ c2 │ │ c3 │ │
│ │ k8s │ │ k8s │ │ k8s │ │
│ │ 1.30 │ │ 1.29 │ │ 1.28 │ │
│ │ (R4) │ │ (R3) │ │ (R2) │ │
│ └────────┘ └────────┘ └────────┘ │
└──────────────────────────────────────────┘
LLM Providers (for /api/chat):
┌──────────────┬──────────────┬──────────────┐
│ OpenAI │ Google │ Ollama │
│ GPT-4o-mini │ Gemini 2.5 │ Gemma 3 1B │
│ │ Flash │ (local) │
└──────────────┴──────────────┴──────────────┘
Regenerate the diagram:
python3 docs/architecture_diagram.py(requirespip install diagrams+apt install graphviz)
Before running this project, ensure the following are installed and configured:
| Requirement | Version | Check Command |
|---|---|---|
| WSL2 (Ubuntu) | 22.04 or 24.04 | wsl --list --verbose (from Windows) |
| Docker Desktop | Latest | docker --version |
| Node.js | 18+ | node --version |
| npm | 9+ | npm --version |
| Python | 3.10+ | python3 --version |
| kind | 0.20+ | kind --version |
| kubectl | 1.27+ | kubectl version --client |
# Create venv (one-time)
python3 -m venv ~/.venvs/artifact-mcp
# Activate
source ~/.venvs/artifact-mcp/bin/activate
# Install dependencies
pip install mcp fastapi uvicorn httpx python-dotenvAPI keys and LLM config are loaded from a .env file at the project root. This file is gitignored and never committed.
# Create from template
cp .env.example .envEdit .env and fill in the providers you want to use:
# OpenAI — get key from https://platform.openai.com/api-keys
OPENAI_API_KEY=sk-proj-...
# Google Gemini — get key from https://aistudio.google.com/apikey
GEMINI_API_KEY=AIza...
# Local LLM (Ollama) — no key needed, just the URL
OLLAMA_BASE_URL=http://127.0.0.1:11434Note: At least one provider must be configured for the AI chat to work. The local Ollama option requires no API key but needs Ollama installed (see below).
# Install Ollama
sudo apt-get install -y zstd # required dependency
curl -fsSL https://ollama.com/install.sh | sh
# Pull the Gemma 3 1B model (lightweight, fast on CPU)
ollama pull gemma3:1b
# Start Ollama (runs on port 11434)
ollama serveClusters are created via the Greenfield tab in the UI or via CLI:
cd MCP_Agents
python3 Artifact_mcp.py deploy \
--input cluster_input.json \
--output-dir ./generated-kind-configs \
--recreate --verboseThis project includes a full dev container configuration for one-click setup on GitHub Codespaces.
-
Open in Codespaces — From the GitHub repo page, click Code → Codespaces → Create codespace on master
-
Wait for setup — The dev container automatically:
- Installs Docker-in-Docker, Node.js 20, Python 3.12, kubectl
- Installs
kindfor local Kubernetes clusters - Creates the Python virtual environment with all dependencies
- Installs frontend npm packages
- Creates
.envfrom the template
-
Configure API keys — Edit
.envand add at least one LLM provider key:# Edit the .env file (created automatically from .env.example) code .envAdd your OpenAI or Gemini API key. Ollama is not recommended in Codespaces (CPU-only, very slow).
-
Start the dashboard:
./start.sh
-
Open the UI — Codespaces auto-forwards port 3000. Click the notification or go to the Ports tab and open port 3000 in browser.
-
Deploy clusters — Use the Greenfield tab in the UI, or CLI:
cd MCP_Agents ~/.venvs/artifact-mcp/bin/python3 Artifact_mcp.py deploy \ --input cluster_input.json \ --output-dir ./generated-kind-configs \ --recreate --verbose
| Feature | Status | Notes |
|---|---|---|
| Docker / kind clusters | ✅ Works | Docker-in-Docker feature enabled |
| Frontend + Backend | ✅ Works | Ports auto-forwarded |
| OpenAI / Gemini chat | ✅ Works | Set API keys in .env |
| Ollama (local LLM) | No GPU in standard Codespaces; ~60s per response on CPU | |
| Port forwarding | ✅ Auto | Ports 3000, 8000, 8765-8767 configured |
| Issue | Fix |
|---|---|
| Setup script didn't run | Run manually: bash .devcontainer/setup.sh |
| Docker not working | Rebuild container: Command Palette → Codespaces: Rebuild Container |
| kind clusters not reachable | Wait 10s after start.sh, then retry — API server needs time in DinD |
.env not found |
cp .env.example .env |
| Services won't start | rm -f .run/*.pid && ./start.sh |
This project contains the Artifact MCP server used to generate kind cluster config YAML files and (optionally) deploy clusters from those generated artifacts.
Artifact_mcp.py: MCP/CLI server scriptcluster_input.json: expected inputcluster_input_new.json: candidate input for deviation checksgenerated-kind-configs/: generated YAML outputartifact_mcp.log: MCP server runtime log
Check if kind clusters exist:
kind get clustersCheck node readiness for all clusters:
for c in c1 c2 c3 c4; do
echo "=== kind-$c ==="
kubectl --context kind-$c get nodes -o wide || true
doneCheck Artifact MCP server process and endpoint:
pgrep -af "Artifact_mcp.py serve" || true
curl -I http://127.0.0.1:8765/ssepython3 Artifact_mcp.py generate \
--input cluster_input.json \
--output-dir ./generated-kind-configsUse --verbose to print progress messages for each step:
python3 Artifact_mcp.py deploy \
--input cluster_input.json \
--output-dir ./generated-kind-configs \
--recreate \
--verboseWhat --verbose shows:
- cluster start banner
- delete step (when
--recreateis used) - create step and streamed kind output
- resource limit update step
- per-cluster success/failure
- final summary
python3 Artifact_mcp.py deploy \
--input cluster_input.json \
--output-dir ./generated-kind-configs \
--recreate \
--verbose 2>&1 | tee rebuild.logIf API server is unreachable (connection refused):
- Ensure Docker is running and accessible.
- Rebuild clusters from input file.
- Re-check contexts and node status.
Commands:
python3 Artifact_mcp.py deploy \
--input cluster_input.json \
--output-dir ./generated-kind-configs \
--recreate \
--verboseIf Docker permission errors appear:
sudo chmod 666 /var/run/docker.sockCompare expected and candidate inputs:
diff -u cluster_input.json cluster_input_new.jsonThen regenerate/deploy only after deviations are approved.
The web UI is now available and wired to Deployment/Deviation logic.
- Frontend:
../webapp(Vite on port 3000) - Backend:
../webapp/backend/main.py(FastAPI on port 8000) - MCP logic imported directly from this folder (
MCP_Agents)
ClusterPanel.jsx: top-level Clusters tab with Greenfield/Brownfield sub-tabsGreenfieldPanel.jsx: release-based cluster deployment and delete actionsBrownfieldPanel.jsx: cluster deviation analysis and release-to-release diffAppDeviationPanel.jsx: top-level Applications tab with Greenfield/Brownfield sub-tabsAppGreenfieldPanel.jsx: release-based application deployment with release badge detectionAppBrownfieldPanel.jsx: application deviation scan, per-app fix, and bulk "Fix All"ChatBox.jsx: AI assistant chat — provider selection (no API keys on the UI)
From the project root:
chmod +x ./start.sh
./start.shWhat start.sh does:
- Creates runtime directories (
.run,.logs) - Installs frontend dependencies if needed
- Starts or skips (if already running):
- Artifact MCP (
127.0.0.1:8765) - Deployment MCP (
127.0.0.1:8766) - Deviation MCP (
127.0.0.1:8767) - Backend API (
127.0.0.1:8000) - Frontend Web (
127.0.0.1:3000)
- Prints endpoint status codes for each service
Example status lines:
[SKIP] artifact_mcp already listening on port 8765
[START] frontend_web on port 3000
[OK] frontend_web started (pid=...)
- Backend releases -> 200
- Frontend -> 200
- Artifact MCP SSE -> 200
Run these checks from anywhere:
# Ports
ss -ltnp | grep -E '(:3000|:8000|:8765|:8766|:8767)'
# Backend API
curl -s -o /dev/null -w 'backend /api/releases: %{http_code}\n' http://127.0.0.1:8000/api/releases
curl -s -o /dev/null -w 'backend /api/clusters: %{http_code}\n' http://127.0.0.1:8000/api/clusters
# Frontend
curl -s -o /dev/null -w 'frontend /: %{http_code}\n' http://127.0.0.1:3000/
# MCP servers (SSE endpoints)
curl -s -I -o /dev/null -w 'artifact_mcp /sse: %{http_code}\n' http://127.0.0.1:8765/sse
curl -s -I -o /dev/null -w 'deployment_mcp /sse: %{http_code}\n' http://127.0.0.1:8766/sse
curl -s -I -o /dev/null -w 'deviation_mcp /sse: %{http_code}\n' http://127.0.0.1:8767/sseExpected result: HTTP 200 for all checks.
Open UI at: http://localhost:3000
Quick UI validation flow:
- Greenfield tab
- Click Refresh in Running Clusters
- Confirm
c1/c2/c3appear with versions
- Brownfield tab
- Select cluster and target release
- Click Analyze Deviations and confirm report appears
- Applications tab
- Select cluster, app
nginx, target release - Click Analyze App Deviation
- Confirm compliance/deviation with remediation output
- Chat panel
- Select a provider from the dropdown (shows ✓ if configured)
- Send a test prompt and confirm assistant reply
API keys are never exposed on the web UI. They are loaded server-side from a .env file at the project root.
| Provider | Model (default) | Requires |
|---|---|---|
| OpenAI (GPT) | gpt-4o-mini |
OPENAI_API_KEY in .env |
| Google Gemini | gemini-2.5-flash |
GEMINI_API_KEY in .env |
| Local LLM (Gemma/Ollama) | gemma3:4b |
Ollama running locally |
-
Copy the example env file:
cp .env.example .env
-
Edit
.envand add your keys (this file is gitignored):# OpenAI — get key from https://platform.openai.com/api-keys OPENAI_API_KEY=sk-proj-... # Google Gemini — get key from https://aistudio.google.com/apikey GEMINI_API_KEY=AIza... # Local LLM (Ollama) — no key needed, just the URL OLLAMA_BASE_URL=http://127.0.0.1:11434
-
For local LLM (Gemma via Ollama):
# Install Ollama (if not already installed) curl -fsSL https://ollama.com/install.sh | sh # Pull the Gemma 3 4B model ollama pull gemma3:4b # Start Ollama server (if not already running) ollama serve
-
Restart the backend to pick up new keys:
# If using start.sh, kill the old backend and re-run kill $(cat .run/backend_api.pid) 2>/dev/null ./start.sh
-
Verify providers in the UI:
- Open
http://localhost:3000 - In the Chat panel, the dropdown shows
✓next to configured providers - Unconfigured providers show
(not configured)
- Open
.envis in.gitignore— it is never committed to git.env.exampleis committed as a template (no real keys)- API keys are only used server-side in the FastAPI backend
- The browser never sees or sends API keys
- The
/api/chat/providersendpoint only returns readiness status, not keys
start.sh writes logs here:
ls -lah .logs/
tail -n 80 .logs/backend_api.log
tail -n 80 .logs/frontend_web.log
tail -n 80 .logs/artifact_mcp.log
tail -n 80 .logs/deployment_mcp.log
tail -n 80 .logs/deviation_mcp.logIf backend fails with import error:
ImportError: cannot import name 'compare_releases' from 'releases'
Fix: ensure webapp/backend/main.py imports compare_releases from Deviation_mcp, not from releases.py.
If frontend appears stuck during install, verify completion with:
cd webapp
ls node_modules | wc -l
npm list --depth=0If clusters are listed but API is unreachable after reboot, restart control-plane containers:
docker start c1-control-plane c2-control-plane c3-control-plane || trueBeyond cluster versions, the system now tracks application versions and state across releases.
Each release (R1-R4) defines expected applications and their versions. For example:
R1: nginx 1.24.0 (2 replicas)
R2: nginx 1.25.0 (2 replicas)
R3: nginx 1.26.0 (3 replicas)
R4: nginx 1.27.0 (3 replicas)
Detects and reports:
- Image mismatch: wrong nginx version deployed (severity: CRITICAL if major version diff, WARNING if patch)
- Replica mismatch: wrong number of instances running
- App not found: application not deployed in expected namespace
- Automatic remediation commands to fix each issue
# Analyze app deviation in cluster vs target release
curl -X POST http://127.0.0.1:8000/api/apps/deviation \
-H "Content-Type: application/json" \
-d '{
"cluster_name": "c1",
"app_name": "nginx",
"target_release": "R4"
}'
# Deploy an app to a cluster
curl -X POST http://127.0.0.1:8000/api/apps/deploy \
-H "Content-Type: application/json" \
-d '{
"cluster_name": "c1",
"app_spec": {
"name": "nginx",
"namespace": "default",
"image": "nginx:1.27.0",
"replicas": 3
}
}'
# Upgrade app image
curl -X POST http://127.0.0.1:8000/api/apps/upgrade \
-H "Content-Type: application/json" \
-d '{
"cluster_name": "c1",
"app_name": "nginx",
"namespace": "default",
"new_image": "nginx:1.27.1"
}'
# Scale app
curl -X POST http://127.0.0.1:8000/api/apps/scale \
-H "Content-Type: application/json" \
-d '{
"cluster_name": "c1",
"app_name": "nginx",
"namespace": "default",
"replicas": 5
}'The web dashboard includes an Applications tab (third tab in Brownfield section) where you can:
- Select a cluster and application
- Pick a target release baseline
- Analyze deviations vs that release
- See live remediation commands
- Identify upgrade paths for apps across releases
Use restart.sh to restart individual services or everything at once. This is the go-to when you update .env, change frontend code, or need to bounce a service.
The backend reads .env at startup. After editing it, restart just the backend:
./restart.sh backendThe frontend does not need a restart — it fetches provider status from the backend on page load. Just refresh the browser after restarting the backend.
./restart.sh backend # Restart backend API only (picks up .env changes)
./restart.sh frontend # Restart frontend / React GUI only
./restart.sh mcp # Restart all 3 MCP agents
./restart.sh ollama # Restart Ollama (local LLM)
./restart.sh # Restart ALL servicesEach command stops the old process, clears stale PID files, and starts fresh.
After a laptop reboot (or WSL restart), several services need to be recovered. Use the one-command recovery script or follow the manual steps below.
./recover.shThis script automatically:
- Checks and fixes Docker socket permissions
- Restarts stopped kind cluster containers (
c1,c2,c3) - Waits for Kubernetes API servers to become ready
- Starts Ollama if installed (for local LLM)
- Cleans stale PID files from previous sessions
- Runs
start.shto bring up all services (MCP agents, backend, frontend)
If you prefer to recover manually, follow these steps in order:
# Docker Desktop users: open Docker Desktop from Windows and wait for it to start
# WSL2 users without Docker Desktop:
sudo service docker start
sudo chmod 666 /var/run/docker.sockVerify: docker ps should work without errors.
After reboot, kind containers are stopped but still exist:
# Check status
docker ps -a --filter "name=control-plane" --format "{{.Names}}: {{.Status}}"
# Restart all cluster containers
docker start c1-control-plane c2-control-plane c3-control-plane
# Wait ~10 seconds for API servers to initialize
sleep 10
# Verify connectivity
for c in c1 c2 c3; do
kubectl --context kind-$c get nodes || echo "$c: NOT READY"
done# Check if running
curl -s http://127.0.0.1:11434/api/tags >/dev/null && echo "Ollama OK" || ollama serve &./start.shThis starts MCP agents (ports 8765-8767), backend API (port 8000), and frontend (port 3000).
| Symptom | Cause | Fix |
|---|---|---|
permission denied on Docker commands |
Docker socket permissions reset | sudo chmod 666 /var/run/docker.sock |
connection refused on kubectl |
kind containers are stopped | docker start c1-control-plane c2-control-plane c3-control-plane |
| Backend fails to start | Stale PID file | rm -f .run/*.pid then ./start.sh |
| Port already in use | Zombie process from before reboot | kill $(lsof -t -i:PORT) then retry |
| Chat shows "not configured" | .env file missing or empty |
cp .env.example .env and fill in keys |
| Chat provider not picking up new API key | Backend needs restart after .env edit |
./restart.sh backend |
| Frontend not reflecting changes | Stale browser cache or frontend not restarted | ./restart.sh frontend and hard-refresh browser (Ctrl+Shift+R) |
| Ollama chat times out on first message | Model cold start (loading into memory) | Wait ~30s and retry; first request is always slow on CPU |
kind get clusters returns empty |
Containers were deleted (not just stopped) | Redeploy via Greenfield tab or python3 Artifact_mcp.py deploy ... |
# Quick health check after recovery
echo "=== Docker ===" && docker ps --format "{{.Names}}: {{.Status}}" | grep control-plane
echo "=== Kubernetes ===" && for c in c1 c2 c3; do kubectl --context kind-$c get nodes -o wide 2>/dev/null || echo "$c: down"; done
echo "=== Services ===" && for p in 3000 8000 8765 8766 8767; do ss -ltn | grep -q ":$p " && echo "Port $p: UP" || echo "Port $p: DOWN"; done
echo "=== Ollama ===" && curl -s http://127.0.0.1:11434/api/tags | head -c 50 2>/dev/null || echo "Not running"