All notable changes to this project are documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
0.3.0 — 2026-05-27
- Operator worker ON/OFF control plane: a
worker_controlstable + a per-(host, queue)WorkerControlWatcherthat hard-stops (kill in-flight + free RAM/VRAM) or parks a worker on command (migration 0012). Seedocs/worker_control.md. - Durable
workflow_node_eventsstore with lifecycle/trip/terminal event writers emitted from the worker, plus a NodePool retention sweep (migration 0011). - Orchestrator-side dead-worker sweep that flags a worker whose heartbeat is
stale while it still holds a
runningjob; health-gated stall trip with re-queue-and-retry (migrations 0009/0010). - No-progress
StallWatchdogfor GPU nodes; re-queue all running jobs on restart and worker self-kill when reassigned; always re-queue orphaned runs. invoke_contexthost hook — a per-node setup/teardown context manager whosefinalize(context_delta)is applied only on success.- MIT
LICENSEand PEP 639 license metadata, keywords, and trove classifiers inpyproject.toml. README.md: a "local-cluster swift management" purpose note, an Architecture diagram (Mermaid), an "Example dashboard" section, a "Turning workers on/off" section, and a Docs index.AGENTS.mdas a byte-identical mirror ofCLAUDE.md, and a documented changelog convention.- This
CHANGELOG.md.
- GPU guard is now health-driven (GPU-idle and static-RAM) instead of a fixed wall-clock cap.
- GPU claim: restored the capability-routing gate.
__input__nodes are parked via the dispatch outbox instead of importing the sentinel.
uv.lockis no longer tracked — this is a library; consumers resolve againstpyproject.toml. The lockfile stays local for dev/CI reproducibility.
0.2.0 — 2026-05-25
- Multi-tenant ingest path: host-defined ingest queues and per-job
args, with host-sidetask_name/queue validation (migration 0008). A second consumer (e.g.lm_flood) can route its own queue names and carry per-job arguments without forking the schema.
0.1.0 — 2026-05-25
- Initial standalone release — a Postgres-as-queue workflow engine extracted
from the
ai_leadsstack (Phase 6): aSELECT … FOR UPDATE SKIP LOCKEDclaim loop woken byLISTEN/NOTIFY, lease reclaim, a DAG dispatcher with a durable dispatch-event outbox, a GPU warm-model cache, periodic ingest work + a PG-native scheduler, and per-host hw-metrics telemetry. Migrations 0001–0007.