Skip to content
This repository was archived by the owner on Mar 11, 2026. It is now read-only.

fix(ci): brain-feed circuit breaker + watchdog + DLQ#137

Open
Mikecranesync wants to merge 1 commit intomainfrom
fix/ci-watchdog
Open

fix(ci): brain-feed circuit breaker + watchdog + DLQ#137
Mikecranesync wants to merge 1 commit intomainfrom
fix/ci-watchdog

Conversation

@Mikecranesync
Copy link
Owner

Summary

  • brain-feed.yml: Circuit breaker (continue-on-error: true), fast timeouts (10s max), jq for safe JSON, dead letter queue as GitHub artifacts
  • ci-watchdog.yml: Runs every 30 min, health-checks brain-ingest, classifies severity (CRITICAL/WARN/INFO/OK), auto-creates/closes GitHub issues with VPS playbook
  • replay-brain-dlq.sh: Replays stored payloads once endpoint recovers, with health check guard

Context

14 consecutive "Feed Open Brain" failures on FactoryLM_OS (curl exit code 28 = timeout). Root cause: brain-ingest endpoint unreachable. No monitoring or alerting existed. See docs/ops/incidents/INC-2026-03-09-001.md.

Companion PR: Mikecranesync/FactoryLM_OS#2

Test plan

  • Push a commit — brain-feed shows green check even with endpoint down
  • Check artifacts — brain-dlq-{sha} created when endpoint unreachable
  • Wait 30 min — ci-watchdog creates issue with severity + playbook
  • Bring endpoint back — next watchdog run auto-closes issue
  • Run replay-brain-dlq.sh — stored payloads replayed

🤖 Generated with Claude Code

- brain-feed.yml: continue-on-error, fast timeouts, jq payload, dead letter queue
- ci-watchdog.yml: 30-min health check, severity classification, auto-issue management
- replay-brain-dlq.sh: replay failed payloads once endpoint recovers
- INC-2026-03-09-001: incident report for 14 consecutive failures
- Ops trace documenting the change

Fixes brain-feed blocking all pushes when brain-ingest endpoint is down.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant