GitHub - z3e8/edge-model-ops-platform

Edge ML Control Plane + Pi Inference

Small edge-to-control-plane setup: Pi runs an image inference service, laptop runs a control plane that stores telemetry + manages model deploy/rollback.

Description

The Pi serves /infer with a bounded queue (so overload becomes a clear 503 instead of random timeouts). The control plane stores telemetry in Postgres, keeps model versions (sha + files), and can deploy a new model to the Pi over ssh + rollback if things look worse.

Live Demo Link & Screenshots

TODO: add later

Tech Stack

Python (FastAPI)
Postgres (docker compose)
Raspberry Pi + systemd (edge service)
bash + small python scripts for demo

Features

telemetry ingest + simple summaries (p50/p95, error/reject rate)
model registry w/ sha256 + local artifact storage
deploy model to Pi (scp/ssh) + verify + restart
rollback on regression (latency/error/reject thresholds)

Setup

copy .env.example to .env if you want to change ports/tokens
docker compose up -d --build
register the Pi once (see demo scripts below)

Architecture

operator scripts -> control plane (FastAPI) -> Postgres + artifacts/
                      ^
                      | telemetry batches (HTTP)
                      |
                 Pi edge service (Flask)

Demo scripts

./demo/demo_up.sh
./demo/demo_register_device.sh (needs SSH_HOST=...)
./demo/demo_upload_models.sh
./demo/demo_deploy.sh (set MODEL_VERSION=v1 / v2)
./demo/demo_deploy_and_rollback.sh (runs the full story)
python demo/load_pi.py (load generator)

Future Improvements

token rotation + tighter auth
artifact store to S3/MinIO
do telemetry aggregates instead of per-event rows
make deployments async (right now deploy waits inside the request)

Challenges and What I Learned

Overload behavior: the “nice” thing is fast responses, the correct thing is predictable failure (bounded queue + explicit reject).
Deployment was mostly about safety checks and boring edges (checksum, restart, timeouts). That stuff is what breaks in real life.
Telemetry design: if you store everything you drown; if you store nothing you can’t debug. Keeping a small schema helped a lot.
Rollback logic is subtle because “numbers got worse” doesn’t always mean “bad model”, so I kept the trigger simple and obvious.

Credits

Solo project built by Zane Hensley

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
app		app
demo		demo
docs		docs
tests		tests
tools		tools
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Edge ML Control Plane + Pi Inference

Description

Live Demo Link & Screenshots

Tech Stack

Features

Setup

Architecture

Demo scripts

Future Improvements

Challenges and What I Learned

Credits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Edge ML Control Plane + Pi Inference

Description

Live Demo Link & Screenshots

Tech Stack

Features

Setup

Architecture

Demo scripts

Future Improvements

Challenges and What I Learned

Credits

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages