Skip to content

wavekat/april8

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

April8 — Universal Model Serving Platform

A centralized, multi-tenant model serving platform that gives ML teams a "paved road" for deploying models to production. Push a 5-line config file, get a live, authenticated, auto-scaling inference endpoint.

This repository contains both a design document and a working prototype deployed on AWS EKS.

Named after 8 April 2026 — the date we received this challenge.


Repository Access

This repository is temporarily public (1–2 days) for code review purposes. It will be set back to private afterwards.


Live Services

Service URL
Platform Dashboard april8.nz
Control Plane API api.april8.nz
Grafana (Monitoring) grafana.april8.nz
MLflow (Model Registry) mlflow.april8.nz
Inference Endpoints {name}.{namespace}.models.april8.nz

How It Works

april8.yaml  ──git push──▶  GitHub Webhook  ──▶  April8 Backend
                                                       │
                              ┌─────────────────┬──────┴──────┐
                              ▼                 ▼             ▼
                         Validate          Upload to S3   Provision NS
                         config            (model files)  (TLS, DNS, auth)
                              │                 │             │
                              └────────┬────────┘             │
                                       ▼                      │
                                 Apply KServe ◀───────────────┘
                                 InferenceService
                                       │
                                       ▼
                              Live inference endpoint
                              (auto-scaling, JWT auth, TLS)

A developer adds april8.yaml to their repo:

version: "1"
deployments:
  fraud-detector:
    model: ./models/fraud-detector/
    framework: sklearn
    tier: staging

Every push to main triggers an automated pipeline that validates the config, uploads model artifacts to S3, provisions the namespace (with TLS, DNS, and JWT auth), and deploys a KServe InferenceService.


Repository Structure

├── output/                 # 18 research documents + 7 component deep-dives
├── experiment/             # EKS cluster setup guides and K8s manifests
├── platform/
│   ├── backend/            # FastAPI control plane (Python 3.12)
│   ├── frontend/           # React + TypeScript dashboard (Vite)
│   ├── k8s/                # Kubernetes manifests (RBAC, ConfigMap, namespace)
│   ├── monitoring/         # Prometheus, Grafana, Loki, Fluent Bit (Helmfile)
│   ├── mlflow/             # MLflow tracking server (Knative service)
│   ├── db/migrations/      # SQL schema migrations (Cloudflare D1)
│   ├── docs/               # Internal platform design docs
│   └── Makefile
├── report/                 # Design document (Typst source → PDF)
│   ├── report.typ          # Main document
│   ├── sections/           # Per-section Typst files
│   ├── screenshots/        # Figures and diagrams
│   └── Makefile
└── .github/workflows/      # CI/CD — backend, frontend, MLflow, monitoring, report

Technology Stack

Layer Choice
Orchestration Kubernetes (AWS EKS)
Serving KServe — multi-framework InferenceService CRDs
Autoscaling Knative KPA — concurrency-based, scale-to-zero
Service Mesh Istio — mTLS, traffic routing, JWT auth enforcement
Model Registry MLflow — version tracking, S3 artifact store
Inference API Open Inference Protocol v2 (REST + gRPC)
Monitoring Prometheus + Grafana + Loki + Fluent Bit
DNS & TLS Cloudflare — anycast DNS, auto-renewing wildcard TLS
Control Plane DB Cloudflare D1 (SQLite over HTTP)

Key Features

  • GitOps deploymentgit push is the only action; no CLI or manual steps
  • Multi-framework — PyTorch, TensorFlow, Scikit-learn, XGBoost, ONNX, HuggingFace
  • Scale-to-zero — per-tier profiles (dev/staging/production) balancing cost vs cold-start
  • Multi-tenant isolation — namespace-per-project, JWT-authenticated inference endpoints
  • Quota enforcement — hierarchical limits (platform → team → project) for CPU, memory, GPU
  • Out-of-the-box monitoring — Prometheus metrics, Grafana dashboards, log aggregation

Design Document

The report is written in Typst and compiled to PDF via GitHub Actions. Source files are in report/.

To build locally:

cd report
make build     # requires typst CLI

Report Sections

  1. Executive Summary
  2. Technology Stack & Justification
  3. Architecture (control plane, data plane, observability plane)
  4. CI/CD for Models
  5. Scale-to-Zero Strategy
  6. Appendices (prototype status, project timeline)

Local Development

Backend

cd platform/backend
cp .env.example .env     # fill in secrets
uv pip install -e ".[dev]"
uvicorn app.main:app --reload --port 4020

Frontend

cd platform/frontend
cp .env.example .env.local
npm install
npm run dev              # http://localhost:4010

Webhooks (local)

npx smee -u https://smee.io/YOUR_CHANNEL -t http://localhost:4020/webhook/github

Research & Design Output

The output/ directory contains 18 research documents and 7 component deep-dives produced during the design phase. See output/README.md for the full index.

About

Universal Model Serving Platform

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors