An end-to-end AI/ML pipeline demonstrating edge inference, GPU benchmarking, container orchestration, and infrastructure automation — built for Low-SWaP (Size, Weight, and Power) deployment environments.
EdgeAI Sentinel is a production-grade object detection system designed to simulate the kind of AI/ML infrastructure work done at enterprise AI factories. The project covers the full lifecycle of an AI model:
- Training — Fine-tune YOLOv8 on a custom dataset using GPU-accelerated PyTorch
- Benchmarking — Profile GPU memory, throughput, latency, and power consumption
- Export — Convert trained models to ONNX and TensorRT for edge deployment
- Edge Deployment — Run optimized inference on a Raspberry Pi (Low-SWaP platform)
- Orchestration — Automate fleet deployment with Ansible and Docker
- Monitoring — Track inference metrics with Prometheus + Grafana dashboards
- CI/CD — Automated MLOps pipeline via GitHub Actions / GitLab CI
| Skill | Implementation |
|---|---|
| GPU / HPC Computing | CUDA profiling, memory benchmarking, PyTorch training loop |
| Low-SWaP Edge Platforms | Raspberry Pi 4/5 inference with ONNX Runtime, power profiling |
| OCI / Container Runtimes | Docker + containerd deployment manifests |
| Kubernetes | K3s manifests for edge fleet orchestration |
| Infrastructure Automation | Ansible playbooks for provisioning and deployment |
| Python / Bash Scripting | Training, benchmarking, edge inference, and utility scripts |
| Monitoring / Observability | Prometheus metrics + Grafana dashboard |
| CI/CD MLOps | GitHub Actions pipeline for train → export → deploy |
| Networking + Security | Firewall rules in Ansible, secure API endpoints |
edgeai-sentinel/
├── training/ # Model training scripts (PyTorch / YOLOv8)
│ ├── train.py # Main training entry point
│ ├── dataset.py # Dataset loader and augmentation
│ └── export.py # ONNX and TensorRT export
├── benchmarks/ # GPU and edge hardware benchmarking
│ ├── gpu_benchmark.py # CUDA profiling (throughput, memory, power)
│ └── edge_benchmark.py # Raspberry Pi inference benchmarking
├── edge/ # Edge device inference application
│ ├── inference.py # ONNX Runtime inference engine
│ ├── camera.py # Camera stream handler (OpenCV)
│ └── api.py # FastAPI inference server
├── orchestration/
│ ├── docker/ # Dockerfile for edge and training containers
│ ├── ansible/ # Playbooks for fleet provisioning
│ └── kubernetes/ # K3s manifests for edge cluster
├── monitoring/
│ ├── prometheus/ # Prometheus scrape configs
│ └── grafana/ # Dashboard JSON exports
├── scripts/ # Utility Bash scripts
├── tests/ # Unit and integration tests
├── notebooks/ # Jupyter exploration notebooks
├── docs/ # Architecture docs and diagrams
└── .github/workflows/ # CI/CD pipeline definitions
- Python 3.10+
- Docker + containerd
- NVIDIA GPU (optional, for training)
- Ansible 2.14+
git clone https://github.com/yourusername/edgeai-sentinel.git
cd edgeai-sentinel
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txtpython training/train.py --config configs/train_config.yaml --epochs 50python benchmarks/gpu_benchmark.py --model yolov8n --batch-sizes 1 4 8 16python training/export.py --checkpoint runs/train/exp/weights/best.pt --format onnx# Using Ansible
ansible-playbook orchestration/ansible/deploy_edge.yml -i inventory.ini
# Or manually on the Raspberry Pi
python edge/inference.py --model models/best.onnx --source 0docker-compose -f orchestration/docker/docker-compose.monitoring.yml up -d
# Access Grafana at http://localhost:3000| Platform | Use Case | Notes |
|---|---|---|
| NVIDIA GPU (RTX 3060+) | Model training, TensorRT export | CUDA 11.8+ required |
| Raspberry Pi 4 (8GB) | Edge inference, API server | ARM64, ~3W idle |
| Raspberry Pi 5 (8GB) | Faster edge inference | ~4W idle, faster NPU |
| USB Camera / Pi Camera | Video input stream | OpenCV compatible |
Sample benchmark results (Raspberry Pi 4B, 8GB, YOLOv8n ONNX):
| Metric | Value |
|---|---|
| Inference latency (avg) | 142 ms/frame |
| Throughput | 7.0 FPS |
| Peak RAM | 380 MB |
| CPU utilization | 68% |
| Power draw (estimated) | 4.2W |
GPU training results (RTX 3060, batch=16):
| Metric | Value |
|---|---|
| Training throughput | 82 images/sec |
| GPU memory (peak) | 6.4 GB |
| mAP@0.5 (COCO val) | 0.532 |
| Export (ONNX) | ✅ |
MIT — see LICENSE