KERNO

The Production Incident Diagnosis Engine for Kubernetes

Your cluster broke. Your dashboards are green. Users are paging. One command. Root cause. 30 seconds.

Quick Start · How It Works · Features · Kubernetes · Docs

Why Kerno?

It's 3am. PagerDuty fires. Latency is up, error budget is burning, and every dashboard you own is green.

Prometheus says CPU and memory look fine.
Datadog APM says your app is healthy.
The Grafana panels your SRE spent a weekend building - all green.

That's because every tool you have watches your application. Nothing is watching the kernel.

flowchart TB
    subgraph Stack["YOUR K8S STACK"]
        App["Workload Pods<br/>(Node, Python, Go, Java)"]
        Runtime["Container Runtime<br/>(containerd, CRI-O)"]
        OS["Node Kernel<br/>(Linux)"]
        HW["Nodes / EC2 / GCE"]
    end

    subgraph Tools["WHO WATCHES WHAT"]
        APM["Datadog · New Relic<br/>Prometheus · Grafana"]
        CRun["Pixie · Tetragon<br/>Inspektor Gadget"]
        Kerno["<b>KERNO</b><br/><i>eBPF kernel tracing</i>"]
        Bare["(nobody)"]
    end

    App -.watched by.-> APM
    Runtime -.watched by.-> CRun
    OS -.watched by.-> Kerno
    HW -.watched by.-> Bare

    style Kerno fill:#e94560,stroke:#fff,color:#fff,stroke-width:3px
    style App fill:#0f3460,stroke:#16213e,color:#fff
    style Runtime fill:#16213e,stroke:#533483,color:#fff
    style OS fill:#1a1a2e,stroke:#e94560,color:#fff
    style HW fill:#533483,stroke:#16213e,color:#fff
    style APM fill:#16213e,stroke:#0f3460,color:#ccc
    style CRun fill:#16213e,stroke:#0f3460,color:#ccc
    style Bare fill:#16213e,stroke:#0f3460,color:#888

The kernel is where the pain actually lives - disk throttling, TCP retransmits, OOM kills, scheduler contention, FD leaks. The kernel knows minutes before your dashboards. Hours before your users.

Kerno runs as a DaemonSet on every node, streams kernel signals through eBPF with microsecond overhead, and turns them into a diagnostic report that reads like a doctor's note.

kubectl -n kerno-system exec ds/kerno -- kerno doctor

One command. 30 seconds later:

$ kubectl -n kerno-system exec ds/kerno -- kerno doctor

  ██╗  ██╗███████╗██████╗ ███╗   ██╗ ██████╗
  ██║ ██╔╝██╔════╝██╔══██╗████╗  ██║██╔═══██╗   Production Incident Report
  █████╔╝ █████╗  ██████╔╝██╔██╗ ██║██║   ██║   ──────────────────────────
  ██╔═██╗ ██╔══╝  ██╔══██╗██║╚██╗██║██║   ██║   Cluster  prod-us-east-1
  ██║  ██╗███████╗██║  ██║██║ ╚████║╚██████╔╝   Node     ip-10-0-3-47
  ╚═╝  ╚═╝╚══════╝╚═╝  ╚═╝╚═╝  ╚═══╝ ╚═════╝    Kernel   6.8.0-aws · x86_64
                                                 Window   30.0s · 12,481 events


  ❚❚  TRIAGE ────────────────────────────────────────────────────────── 30s
      ● ● ●   2 critical  ·  1 warning  ·  0 info
  ─────────────────────────────────────────────────────────────────────────


  ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
  ┃ ●  CRITICAL  ·  Disk I/O Bottleneck                          SLO 🔥   ┃
  ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
     Pod       payments/checkout-api-7f4b8c-x9k2p
     Volume    /var/lib/postgres  (EBS gp3, 3,000 IOPS)
     Signal    fsync p99  ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇  280 ms
               threshold  ▇▇▇▇▇▇▇▇▇▇▇▇▇                 200 ms
     Cause     EBS volume saturated · fsync blocking commit log writes
     Impact    checkout p99 ↑ 8.2×  ·  connection pool 94% saturated
     Fix       → bump EBS IOPS  3000 → 6000
               → or move commit log to local NVMe

  ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
  ┃ ●  CRITICAL  ·  TCP Retransmit Storm                         SLO 🔥   ┃
  ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
     Namespace ingress-nginx
     Pod       nginx-ingress-controller-qz8hv
     Signal    retransmits  ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇    12.3 %
               threshold    ▇▇▇▇                          2.0 %
     Cause     packet loss on eth0 · likely upstream NLB path
     Impact    every external request at risk of latency spike
     Fix       → kubectl exec ... -- ethtool -S eth0 | grep error
               → check AWS NLB target group health

  ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
  ┃ ▲  WARNING   ·  Scheduler Contention                                  ┃
  ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
     Node      ip-10-0-3-47  (c6i.2xlarge · 8 vCPU)
     Signal    runqueue p99  ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇             18 ms
               threshold     ▇▇▇▇▇                            5 ms
     Cause     noisy neighbor · 14 pods competing for 8 vCPUs
     Fix       → reduce pod density or set CPU requests
               → kubectl top pods --sort-by=cpu


  ▼  RECOMMENDED ORDER ──────────────────────────────────────────────────

     1.  [ NOW  ]   Disk I/O Bottleneck       payments/checkout-api
     2.  [ NOW  ]   TCP Retransmit Storm      ingress-nginx
     3.  [ 5min ]   Scheduler Contention      ip-10-0-3-47


  kerno doctor --ai           for root-cause analysis
  kerno doctor --output json  for runbooks / Slack bots
  kerno predict               to surface failures before they page you

That's the entire debugging loop - from page to root cause - in a single command.

How Kerno compares

	Watches	K8s-Native	Incident Report	SLO Mapping	AI Analysis	Install Time
Prometheus + Grafana	Application	Partial	No	No	No	Hours
Datadog APM	Application	Partial	No	Partial	Yes	Hours
Cilium Tetragon	Security	Yes	No	No	No	Minutes
Inspektor Gadget	Container	Yes	No	No	No	Minutes
Pixie	Application	Yes	No	No	No	Minutes
Kerno	Kernel	Yes	Yes	Yes	Yes	< 1 min

Kerno is the only eBPF tool in the Kubernetes ecosystem that produces a ranked, human-readable incident report - not a firehose of events, not another dashboard, not a query language to learn.

Quick Start

Prerequisites

Kubernetes 1.25+ with nodes running kernel 5.8+ (BTF enabled - all major managed offerings qualify: EKS, GKE, AKS, DOKS, Linode, Civo)
kubectl with cluster-admin (to create the namespace + RBAC)

Install on Kubernetes (primary)

# Helm
helm install kerno ./deploy/helm/kerno \
  -n kerno-system --create-namespace

# Or raw manifests
kubectl apply -f deploy/k8s/

That's it. Within 30 seconds Kerno is running as a DaemonSet on every node.

Run an incident report on a live cluster

# Diagnose the whole cluster - 30 seconds of real kernel data
kubectl -n kerno-system exec ds/kerno -- kerno doctor

# JSON output for on-call runbooks / Slack bots
kubectl -n kerno-system exec ds/kerno -- kerno doctor --output json

# With AI-powered root cause analysis
kubectl -n kerno-system set env ds/kerno KERNO_AI_API_KEY=sk-...
kubectl -n kerno-system exec ds/kerno -- kerno doctor --ai

Continuous metrics (for Prometheus + Alertmanager)

kubectl -n kerno-system port-forward ds/kerno 9090:9090
curl localhost:9090/metrics

ServiceMonitor support is built-in when the Prometheus Operator is installed.

Bare metal / VMs (secondary)

For standalone Linux servers outside Kubernetes:

# One-liner install
curl -sfL https://raw.githubusercontent.com/lowplane/kerno/main/scripts/install.sh | sudo bash
sudo kerno doctor

# Pin a specific version: --version v0.1.0

Or run as a long-lived systemd service with Prometheus metrics:

curl -sfL https://raw.githubusercontent.com/lowplane/kerno/main/scripts/install.sh | sudo bash -s -- --daemon
journalctl -u kerno -f

Docker (ad-hoc, any host)

docker run --rm --privileged --pid=host \
  -v /sys/kernel/debug:/sys/kernel/debug:ro \
  -v /sys/kernel/btf:/sys/kernel/btf:ro \
  -v /sys/fs/bpf:/sys/fs/bpf \
  -v /proc:/proc:ro \
  ghcr.io/lowplane/kerno:v0.1.0 doctor

Multi-arch images (linux/amd64, linux/arm64) published to GHCR on every release.

Kubernetes Deployment

Kerno is designed from day one to run as a Kubernetes DaemonSet. One pod per node, one eBPF agent per kernel, zero API server load.

flowchart TB
    subgraph Cluster["Kubernetes Cluster"]
        direction TB
        subgraph Node1["Worker Node 1"]
            K1["Kerno Pod<br/><i>DaemonSet</i>"]
            W1["Workload Pods"]
        end
        subgraph Node2["Worker Node 2"]
            K2["Kerno Pod<br/><i>DaemonSet</i>"]
            W2["Workload Pods"]
        end
        subgraph Node3["Worker Node N"]
            K3["Kerno Pod<br/><i>DaemonSet</i>"]
            W3["Workload Pods"]
        end
    end

    K1 -->|:9090/metrics| Prom["Prometheus"]
    K2 -->|:9090/metrics| Prom
    K3 -->|:9090/metrics| Prom
    Prom --> GF["Grafana"]

    K1 -.enriches.-> W1
    K2 -.enriches.-> W2
    K3 -.enriches.-> W3

    style K1 fill:#e94560,stroke:#fff,color:#fff
    style K2 fill:#e94560,stroke:#fff,color:#fff
    style K3 fill:#e94560,stroke:#fff,color:#fff
    style Prom fill:#0f3460,stroke:#fff,color:#fff
    style GF fill:#16213e,stroke:#fff,color:#fff
    style W1 fill:#533483,stroke:#fff,color:#fff
    style W2 fill:#533483,stroke:#fff,color:#fff
    style W3 fill:#533483,stroke:#fff,color:#fff

Pod enrichment - no API server load

Kerno tags every finding with pod, namespace, node, and workload labels. No client-go informers, no watch connections - Kerno reads /var/lib/kubelet/pods directly, so even a failing API server doesn't blind the agent. Exactly when you need it most.

Host mounts - the minimum necessary

Mount	Why
`/sys/kernel/debug`	tracepoints, kprobes
`/sys/kernel/btf`	CO-RE type resolution
`/sys/fs/bpf`	BPF map pinning
`/proc`	PID → cgroup → pod resolution
`/sys/fs/cgroup`	container resource accounting
`/sys/class/net`	per-interface TCP counters
`/sys/block`	per-device disk stats

Security posture

Runs with the minimum capabilities needed - CAP_BPF, CAP_PERFMON, CAP_SYS_PTRACE, CAP_NET_ADMIN, CAP_DAC_READ_SEARCH (not CAP_SYS_ADMIN for the hot path).
Read-only root filesystem, ProtectSystem=strict via systemd on bare metal.
No outbound network calls. AI integration is opt-in and goes through your configured provider only.

Helm values

image:
  repository: ghcr.io/lowplane/kerno
  tag: v0.1.0

resources:
  requests: { cpu: 100m, memory: 128Mi }
  limits:   { cpu: "1",  memory: 512Mi }

prometheus:
  enabled: true
  port: 9090

serviceMonitor:     # Prometheus Operator
  enabled: true
  interval: 15s

nodeSelector:
  monitoring: "true"

Verify

kubectl -n kerno-system get ds kerno
kubectl -n kerno-system logs -l app.kubernetes.io/name=kerno
kubectl -n kerno-system exec ds/kerno -- kerno doctor

Features

Incident Diagnosis

kerno doctor - 30-second cluster-wide diagnostic, ranked findings, fix suggestions
kerno explain - AI-powered kernel error explanation (no root needed)
kerno predict - surface failures before they page you

Real-Time Tracing

kerno trace syscall - per-pod syscall latency streaming
kerno trace disk - block I/O latency by device, op, process
kerno trace sched - CPU scheduler run queue delays

Continuous Monitoring

kerno watch tcp - TCP connections, RTT, retransmits
kerno watch oom - OOM kill alerts with pod context
kerno watch fd - FD leak detection via growth rate
kerno start - daemon mode with Prometheus metrics

Integrations

Prometheus - 16 metrics at /metrics, ServiceMonitor support
Kubernetes - Helm chart + pod enrichment (no API server load)
AI Providers - Anthropic, OpenAI, Ollama (optional, opt-in)
Systemd - unit/slice enrichment on bare metal

How It Works

Kerno runs as a lightweight Go agent with six tiny eBPF programs attached to stable tracepoints. When kerno doctor runs, it collects 30 seconds of real kernel data, evaluates 11 diagnostic rules deterministically, and emits a ranked incident report. No sampling. No guesswork. No query language.

Architecture

flowchart TB
    subgraph Kernel["KERNEL SPACE · eBPF Programs"]
        direction LR
        P1["syscall<br/>latency"]
        P2["tcp<br/>monitor"]
        P3["oom<br/>track"]
        P4["disk<br/>io"]
        P5["sched<br/>delay"]
        P6["fd<br/>track"]
    end

    RB[("Ring Buffers<br/>256KB per program<br/>zero-copy mmap")]

    subgraph UserSpace["USER SPACE · Go"]
        direction TB
        Loader["BPF Loaders<br/>cilium/ebpf"]
        Collector["Collectors<br/>percentile aggregation"]
        Signals[("Signals Snapshot<br/>single source of truth")]
        Adapter["Environment Adapter<br/>k8s · systemd · bare metal"]
    end

    subgraph Outputs["OUTPUTS"]
        direction TB
        Doctor["Doctor Engine<br/>11 diagnostic rules"]
        AI["AI Layer <i>(optional)</i><br/>root cause analysis"]
        Prom["Prometheus<br/>/metrics :9090"]
        CLI["Terminal<br/>pretty · JSON"]
    end

    P1 & P2 & P3 & P4 & P5 & P6 --> RB
    RB --> Loader
    Loader --> Collector
    Collector --> Signals
    Adapter -.enriches.-> Signals
    Signals --> Doctor
    Signals --> Prom
    Doctor --> AI
    AI --> CLI
    Doctor --> CLI

    classDef kernel fill:#1a1a2e,stroke:#e94560,color:#fff,stroke-width:2px
    classDef user fill:#0f3460,stroke:#16213e,color:#fff,stroke-width:2px
    classDef output fill:#16213e,stroke:#533483,color:#fff,stroke-width:2px
    classDef buffer fill:#533483,stroke:#e94560,color:#fff,stroke-width:3px
    classDef ai fill:#e94560,stroke:#fff,color:#fff,stroke-width:2px

    class P1,P2,P3,P4,P5,P6 kernel
    class Loader,Collector,Signals,Adapter user
    class Doctor,Prom,CLI output
    class RB buffer
    class AI ai

Core principles

Deterministic first. The rule engine is pure Go, testable, and runs whether AI is on or off. Every finding has a clear cause, threshold, and fix.
Zero-copy hot path. Kernel events land in eBPF ring buffers and are drained via mmap - microsecond overhead, no serialization cost.
No API server load. Pod enrichment reads the kubelet's local pod manifests. The agent survives API server outages - the moment you need it most.
AI is a post-processor. Optional. Opt-in. Never touches the hot path. The deterministic engine always runs; AI enriches, it never replaces.
Graceful degradation. If an eBPF program fails to load on a weird kernel, that collector is skipped with a clear warning. The rest keep working.

Data flow

sequenceDiagram
    participant K as Kernel<br/>(eBPF)
    participant R as Ring Buffer
    participant C as Collectors
    participant D as Doctor Engine
    participant A as AI Layer
    participant U as On-call Engineer

    K->>R: syscall/tcp/oom/io events
    Note over K,R: Zero-copy, microsecond overhead
    R->>C: drain events
    C->>C: aggregate into p50/p95/p99
    C->>D: Signals snapshot
    D->>D: evaluate 11 rules
    alt AI enabled
        D->>A: findings + signals
        A->>A: correlate + explain
        A->>U: incident report + root cause
    else AI disabled
        D->>U: deterministic incident report
    end

The Diagnostic Rules

Kerno runs 11 deterministic rules against every snapshot. Every rule is explainable, configurable, and covered by tests.

#	Rule	Triggers When	Severity
1	Disk I/O Bottleneck	fsync p99 > 50ms or write p99 > 200ms	WARN / CRIT
2	OOM Kill Occurred	Any OOM event in window	CRIT
3	TCP Retransmit Storm	Retransmit rate > 2%	CRIT
4	TCP RTT Degradation	RTT p99 > 10ms	WARN
5	Scheduler Contention	Runqueue delay p99 > 5ms	WARN / CRIT
6	FD Leak	FD growth > 10/sec sustained	WARN (with ETA)
7	Syscall Latency High	Any syscall p99 > 100ms	WARN / CRIT
8	OOM Imminent	Memory > 90% + positive growth	WARN / CRIT (with ETA)
9	Syscall Error Rate	Error rate > 1% per syscall	WARN / CRIT
10	Memory Pressure	RSS usage > 90%	WARN
11	Network Latency	Connection RTT > 100ms	WARN

Usage

Incident diagnosis - "what broke just now?"

# The golden command
kubectl -n kerno-system exec ds/kerno -- kerno doctor

# Quick 10-second check
kubectl -n kerno-system exec ds/kerno -- kerno doctor --duration 10s

# JSON for CI/CD, runbooks, Slack bots (non-zero exit on critical)
kubectl -n kerno-system exec ds/kerno -- kerno doctor --output json --exit-code

# AI-powered root cause analysis
kubectl -n kerno-system exec ds/kerno -- kerno doctor --ai

# Explain a kernel error (no root, no cluster needed)
kerno explain "BUG: kernel NULL pointer dereference"
dmesg | tail -5 | kerno explain

# Predict failures before they page you
kubectl -n kerno-system exec ds/kerno -- kerno predict --snapshots 5 --interval 15s

Real-time tracing - "watch it happen"

# Every syscall event streaming
kubectl -n kerno-system exec ds/kerno -- kerno trace syscall

# Only syscalls from a specific pod's PID
kubectl -n kerno-system exec ds/kerno -- kerno trace syscall --pid 1234

# Postgres disk writes over 5ms
kubectl -n kerno-system exec ds/kerno -- kerno trace disk --process postgres --op write --threshold 5ms

# Scheduler delays over 10ms
kubectl -n kerno-system exec ds/kerno -- kerno trace sched --threshold 10ms

Continuous monitoring - "alert me when…"

# TCP connections with retransmits
kubectl -n kerno-system exec ds/kerno -- kerno watch tcp --retransmits

# Any OOM kill, with pod context
kubectl -n kerno-system exec ds/kerno -- kerno watch oom --alert

# Processes leaking FDs
kubectl -n kerno-system exec ds/kerno -- kerno watch fd --threshold 10

Prometheus Metrics

The DaemonSet exposes 16 metrics at :9090/metrics. ServiceMonitor is included when the Prometheus Operator is installed.

View all 16 metrics

Metric	Type	What It Measures
`kerno_syscall_duration_nanoseconds`	Summary	Syscall latency (p50, p95, p99)
`kerno_syscall_total`	Counter	Total syscall events
`kerno_tcp_rtt_nanoseconds`	Summary	TCP round-trip time
`kerno_tcp_retransmits_total`	Counter	TCP retransmissions
`kerno_tcp_connections_total`	Counter	TCP connection events
`kerno_oom_kills_total`	Counter	OOM kill events
`kerno_disk_io_duration_nanoseconds`	Summary	Disk I/O latency
`kerno_disk_io_bytes_total`	Counter	Disk I/O bytes
`kerno_sched_delay_nanoseconds`	Summary	CPU run queue delay
`kerno_fd_open_total`	Counter	FD open operations
`kerno_fd_close_total`	Counter	FD close operations
`kerno_collector_events_total`	Counter	Events per collector
`kerno_collector_errors_total`	Counter	Errors per collector
`kerno_bpf_programs_loaded`	Gauge	Loaded eBPF programs
`kerno_info`	Gauge	Build version

Health endpoints: /healthz and /readyz return JSON status.

Environment & AI

Environment auto-detection. Kerno picks one of three adapters and enriches every event - no configuration required:

Kubernetes (in-cluster token present) → pod, namespace, node, deployment
Systemd (PID 1 is systemd) → unit, slice, scope
Bare metal → hostname, cgroup path

AI (optional). The AI layer runs after the deterministic rule engine - it correlates cross-signals and explains root causes, it never replaces rules. Three providers (Anthropic, OpenAI, Ollama for air-gapped), three privacy modes (full / redacted / summary), TTL cache + token-bucket rate limiting, graceful fallback to a deterministic template on failure. No LLM SDK dependencies - pure net/http.

kubectl -n kerno-system set env ds/kerno \
  KERNO_AI_API_KEY=sk-... \
  KERNO_AI_PROVIDER=anthropic
kubectl -n kerno-system exec ds/kerno -- kerno doctor --ai

Configuration

Kerno works with zero configuration. For custom setups, mount a config.yaml or use KERNO_* env vars:

log_level: info

collectors:
  syscall_latency: true
  tcp_monitor: true
  oom_track: true
  disk_io: true
  sched_delay: true
  fd_track: true

doctor:
  duration: 30s
  thresholds:
    syscall_p99_warning_ns:  100000000   # 100ms
    syscall_p99_critical_ns: 500000000   # 500ms
    tcp_retransmit_pct:      2.0         # 2%
    oom_memory_pct:          90.0        # 90%
    disk_p99_warning_ns:     50000000    # 50ms
    disk_p99_critical_ns:    200000000   # 200ms
    sched_delay_warning_ns:  5000000     # 5ms
    sched_delay_critical_ns: 20000000    # 20ms
    fd_growth_per_sec:       10.0

prometheus:
  enabled: true
  addr: ":9090"

ai:
  enabled: false
  provider: anthropic
  privacy_mode: summary

Precedence: CLI flags > environment variables (KERNO_*) > config file > defaults.

Roadmap

See TODO.md for the full plan. Headlines:

v0.1 - DaemonSet, 6 collectors, 11 rules, Prometheus, AI post-processor - shipped
v0.2 - CRD for cluster-wide incident policies, OpenTelemetry OTLP export, Grafana dashboards
v0.3 - historical incident replay, SLO-linked alerts, Slack / PagerDuty integrations
v1.0 - multi-cluster control plane, managed offering (Lowplane Cloud)

Building from Source

# Requirements: Go 1.25+
# Optional for real eBPF: clang 14+, libbpf-dev, llvm

make build          # Build binary (uses BPF stubs - no clang needed)
make bpf            # Compile eBPF C programs
make test           # Run unit tests
make test-race      # Run with race detector
make lint           # golangci-lint
make check          # Full CI check: vet + test + lint
make docker         # Build Docker image

Contributing

Contributions welcome. See CONTRIBUTING.md for:

Development setup and prerequisites
Commit message conventions (Conventional Commits)
Code review process
DCO sign-off requirement

For security reports, see SECURITY.md.

License

Apache License 2.0 - see LICENSE.

Kerno is built by Shivam at Lowplane.

If Kerno saved your on-call shift, consider leaving a star - it helps other engineers find the project.

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
.github		.github
cmd/kerno		cmd/kerno
deploy		deploy
docs		docs
internal		internal
scripts		scripts
.editorconfig		.editorconfig
.gitignore		.gitignore
.golangci.yml		.golangci.yml
.goreleaser.yml		.goreleaser.yml
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
Dockerfile.goreleaser		Dockerfile.goreleaser
GOVERNANCE.md		GOVERNANCE.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
go.mod		go.mod
go.sum		go.sum

Folders and files

Latest commit

History

Repository files navigation

KERNO

The Production Incident Diagnosis Engine for Kubernetes

Why Kerno?

How Kerno compares

Quick Start

Prerequisites

Install on Kubernetes (primary)

Run an incident report on a live cluster

Continuous metrics (for Prometheus + Alertmanager)

Bare metal / VMs (secondary)

Docker (ad-hoc, any host)

Kubernetes Deployment

Pod enrichment - no API server load

Host mounts - the minimum necessary

Security posture

Helm values

Verify

Features

Incident Diagnosis

Real-Time Tracing

Continuous Monitoring

Integrations

How It Works

Architecture

Core principles

Data flow

The Diagnostic Rules

Usage

Incident diagnosis - "what broke just now?"

Real-time tracing - "watch it happen"

Continuous monitoring - "alert me when…"

Prometheus Metrics

Environment & AI

Configuration

Roadmap

Building from Source

Contributing

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages