I like building systems that don't just workβthey scale, reason, and recover. I thrive when designing complex backend architectures that transform raw telemetry into actionable intelligence. I do like ui as well, so you will see a very consistent ui theme I go for :)
- Distributed Systems: Designing for resilience, consistency, and low-latency.
- Observability Stack: Deep integration with the LGTM stack (Loki, Grafana, Tempo, Mimir).
- AI-Native Engineering: Building reasoning engines for automated Root Cause Analysis (RCA).
- Infrastructure as Code: Orchestrating scalable, cloud-native environments.
I am currently developing a suite of interconnected observability tools designed to eliminate the friction in modern SRE workflows.
π BeObservant
The Control Plane. A unified platform for metrics, logs, traces, and alerts. It acts as the "Single Pane of Glass" for distributed systems, enforcing RBAC and multi-tenancy across the entire LGTM stack.
π§ BeCertain
The Analyst. A Python-based reasoning engine that processes telemetry data to provide automated Root Cause Analysis (RCA), anomaly detection, and predictive forecasting.
π BeNotified
The Messenger. An intelligent alerting and incident orchestration service. It manages the lifecycle of an alertβfrom the moment a threshold is crossed in Mimir to the final resolution note in Jira.
- CodeMasterPro: A specialized developer tooling platform designed to streamline the local development environment and improve engineering velocity. I did explore the ralph-wiggum principal a while ago, have a look at the code