A centralized, multi-tenant model serving platform that gives ML teams a "paved road" for deploying models to production. Push a 5-line config file, get a live, authenticated, auto-scaling inference endpoint.
This repository contains both a design document and a working prototype deployed on AWS EKS.
Named after 8 April 2026 — the date we received this challenge.
This repository is temporarily public (1–2 days) for code review purposes. It will be set back to private afterwards.
| Service | URL |
|---|---|
| Platform Dashboard | april8.nz |
| Control Plane API | api.april8.nz |
| Grafana (Monitoring) | grafana.april8.nz |
| MLflow (Model Registry) | mlflow.april8.nz |
| Inference Endpoints | {name}.{namespace}.models.april8.nz |
april8.yaml ──git push──▶ GitHub Webhook ──▶ April8 Backend
│
┌─────────────────┬──────┴──────┐
▼ ▼ ▼
Validate Upload to S3 Provision NS
config (model files) (TLS, DNS, auth)
│ │ │
└────────┬────────┘ │
▼ │
Apply KServe ◀───────────────┘
InferenceService
│
▼
Live inference endpoint
(auto-scaling, JWT auth, TLS)
A developer adds april8.yaml to their repo:
version: "1"
deployments:
fraud-detector:
model: ./models/fraud-detector/
framework: sklearn
tier: stagingEvery push to main triggers an automated pipeline that validates the config, uploads model artifacts to S3, provisions the namespace (with TLS, DNS, and JWT auth), and deploys a KServe InferenceService.
├── output/ # 18 research documents + 7 component deep-dives
├── experiment/ # EKS cluster setup guides and K8s manifests
├── platform/
│ ├── backend/ # FastAPI control plane (Python 3.12)
│ ├── frontend/ # React + TypeScript dashboard (Vite)
│ ├── k8s/ # Kubernetes manifests (RBAC, ConfigMap, namespace)
│ ├── monitoring/ # Prometheus, Grafana, Loki, Fluent Bit (Helmfile)
│ ├── mlflow/ # MLflow tracking server (Knative service)
│ ├── db/migrations/ # SQL schema migrations (Cloudflare D1)
│ ├── docs/ # Internal platform design docs
│ └── Makefile
├── report/ # Design document (Typst source → PDF)
│ ├── report.typ # Main document
│ ├── sections/ # Per-section Typst files
│ ├── screenshots/ # Figures and diagrams
│ └── Makefile
└── .github/workflows/ # CI/CD — backend, frontend, MLflow, monitoring, report
| Layer | Choice |
|---|---|
| Orchestration | Kubernetes (AWS EKS) |
| Serving | KServe — multi-framework InferenceService CRDs |
| Autoscaling | Knative KPA — concurrency-based, scale-to-zero |
| Service Mesh | Istio — mTLS, traffic routing, JWT auth enforcement |
| Model Registry | MLflow — version tracking, S3 artifact store |
| Inference API | Open Inference Protocol v2 (REST + gRPC) |
| Monitoring | Prometheus + Grafana + Loki + Fluent Bit |
| DNS & TLS | Cloudflare — anycast DNS, auto-renewing wildcard TLS |
| Control Plane DB | Cloudflare D1 (SQLite over HTTP) |
- GitOps deployment —
git pushis the only action; no CLI or manual steps - Multi-framework — PyTorch, TensorFlow, Scikit-learn, XGBoost, ONNX, HuggingFace
- Scale-to-zero — per-tier profiles (dev/staging/production) balancing cost vs cold-start
- Multi-tenant isolation — namespace-per-project, JWT-authenticated inference endpoints
- Quota enforcement — hierarchical limits (platform → team → project) for CPU, memory, GPU
- Out-of-the-box monitoring — Prometheus metrics, Grafana dashboards, log aggregation
The report is written in Typst and compiled to PDF via GitHub Actions. Source files are in report/.
To build locally:
cd report
make build # requires typst CLI- Executive Summary
- Technology Stack & Justification
- Architecture (control plane, data plane, observability plane)
- CI/CD for Models
- Scale-to-Zero Strategy
- Appendices (prototype status, project timeline)
cd platform/backend
cp .env.example .env # fill in secrets
uv pip install -e ".[dev]"
uvicorn app.main:app --reload --port 4020cd platform/frontend
cp .env.example .env.local
npm install
npm run dev # http://localhost:4010npx smee -u https://smee.io/YOUR_CHANNEL -t http://localhost:4020/webhook/githubThe output/ directory contains 18 research documents and 7 component deep-dives produced during the design phase. See output/README.md for the full index.