April8 — Universal Model Serving Platform

A centralized, multi-tenant model serving platform that gives ML teams a "paved road" for deploying models to production. Push a 5-line config file, get a live, authenticated, auto-scaling inference endpoint.

This repository contains both a design document and a working prototype deployed on AWS EKS.

Named after 8 April 2026 — the date we received this challenge.

Repository Access

This repository is temporarily public (1–2 days) for code review purposes. It will be set back to private afterwards.

Live Services

Service	URL
Platform Dashboard	april8.nz
Control Plane API	api.april8.nz
Grafana (Monitoring)	grafana.april8.nz
MLflow (Model Registry)	mlflow.april8.nz
Inference Endpoints	`{name}.{namespace}.models.april8.nz`

How It Works

april8.yaml  ──git push──▶  GitHub Webhook  ──▶  April8 Backend
                                                       │
                              ┌─────────────────┬──────┴──────┐
                              ▼                 ▼             ▼
                         Validate          Upload to S3   Provision NS
                         config            (model files)  (TLS, DNS, auth)
                              │                 │             │
                              └────────┬────────┘             │
                                       ▼                      │
                                 Apply KServe ◀───────────────┘
                                 InferenceService
                                       │
                                       ▼
                              Live inference endpoint
                              (auto-scaling, JWT auth, TLS)

A developer adds april8.yaml to their repo:

version: "1"
deployments:
  fraud-detector:
    model: ./models/fraud-detector/
    framework: sklearn
    tier: staging

Every push to main triggers an automated pipeline that validates the config, uploads model artifacts to S3, provisions the namespace (with TLS, DNS, and JWT auth), and deploys a KServe InferenceService.

Repository Structure

├── output/                 # 18 research documents + 7 component deep-dives
├── experiment/             # EKS cluster setup guides and K8s manifests
├── platform/
│   ├── backend/            # FastAPI control plane (Python 3.12)
│   ├── frontend/           # React + TypeScript dashboard (Vite)
│   ├── k8s/                # Kubernetes manifests (RBAC, ConfigMap, namespace)
│   ├── monitoring/         # Prometheus, Grafana, Loki, Fluent Bit (Helmfile)
│   ├── mlflow/             # MLflow tracking server (Knative service)
│   ├── db/migrations/      # SQL schema migrations (Cloudflare D1)
│   ├── docs/               # Internal platform design docs
│   └── Makefile
├── report/                 # Design document (Typst source → PDF)
│   ├── report.typ          # Main document
│   ├── sections/           # Per-section Typst files
│   ├── screenshots/        # Figures and diagrams
│   └── Makefile
└── .github/workflows/      # CI/CD — backend, frontend, MLflow, monitoring, report

Technology Stack

Layer	Choice
Orchestration	Kubernetes (AWS EKS)
Serving	KServe — multi-framework InferenceService CRDs
Autoscaling	Knative KPA — concurrency-based, scale-to-zero
Service Mesh	Istio — mTLS, traffic routing, JWT auth enforcement
Model Registry	MLflow — version tracking, S3 artifact store
Inference API	Open Inference Protocol v2 (REST + gRPC)
Monitoring	Prometheus + Grafana + Loki + Fluent Bit
DNS & TLS	Cloudflare — anycast DNS, auto-renewing wildcard TLS
Control Plane DB	Cloudflare D1 (SQLite over HTTP)

Key Features

GitOps deployment — git push is the only action; no CLI or manual steps
Multi-framework — PyTorch, TensorFlow, Scikit-learn, XGBoost, ONNX, HuggingFace
Scale-to-zero — per-tier profiles (dev/staging/production) balancing cost vs cold-start
Multi-tenant isolation — namespace-per-project, JWT-authenticated inference endpoints
Quota enforcement — hierarchical limits (platform → team → project) for CPU, memory, GPU
Out-of-the-box monitoring — Prometheus metrics, Grafana dashboards, log aggregation

Design Document

The report is written in Typst and compiled to PDF via GitHub Actions. Source files are in report/.

To build locally:

cd report
make build     # requires typst CLI

Report Sections

Executive Summary
Technology Stack & Justification
Architecture (control plane, data plane, observability plane)
CI/CD for Models
Scale-to-Zero Strategy
Appendices (prototype status, project timeline)

Local Development

Backend

cd platform/backend
cp .env.example .env     # fill in secrets
uv pip install -e ".[dev]"
uvicorn app.main:app --reload --port 4020

Frontend

cd platform/frontend
cp .env.example .env.local
npm install
npm run dev              # http://localhost:4010

Webhooks (local)

npx smee -u https://smee.io/YOUR_CHANNEL -t http://localhost:4020/webhook/github

Research & Design Output

The output/ directory contains 18 research documents and 7 component deep-dives produced during the design phase. See output/README.md for the full index.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

April8 — Universal Model Serving Platform

Repository Access

Live Services

How It Works

Repository Structure

Technology Stack

Key Features

Design Document

Report Sections

Local Development

Backend

Frontend

Webhooks (local)

Research & Design Output

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.claude		.claude
.github/workflows		.github/workflows
experiment		experiment
output		output
platform		platform
report		report
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

April8 — Universal Model Serving Platform

Repository Access

Live Services

How It Works

Repository Structure

Technology Stack

Key Features

Design Document

Report Sections

Local Development

Backend

Frontend

Webhooks (local)

Research & Design Output

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages