diff --git a/CHANGELOG.md b/CHANGELOG.md
index 61a4348..fd528fc 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -9,6 +9,339 @@ Changes are tracked via git tags. Each release tag corresponds to an entry here.
 
 ## [Unreleased]
 
+### Fixed — `cfcf stop` now kills the active subprocess + clears the dashboard indicator + new `cfcf agents reap` escape hatch
+
+Real gmbot dogfood: after iter 19 completed and reflection started
+at 3:41 pm, the user observed the dashboard still showing
+"reflection running" at 8 pm — 4.5 hours later. PA's investigation
+found that the loop state was `phase: stopped, outcome: stopped`,
+but the reflection's codex subprocess (PID 99729) was still alive
+with the cfcf server as its parent.
+
+**Root cause** (four layered bugs — the fourth surfaced after the
+others were fixed and the user reported the loop continuing
+post-kill-9):
+
+1. **`stopLoop()` only flips the state flag.** It set `state.phase
+   = "stopped"` and saved — but never signaled any subprocess.
+   The main while-loop's `isStopped(state)` check would exit
+   cleanly at the next iteration boundary; a mid-flight subprocess
+   (the reflection codex) kept running because nothing told it to
+   stop. The runner's `await managed.result` promise got stranded.
+2. **The per-role state-store row stayed `executing`.** When the
+   subprocess was orphaned, the reflection-runner's normal
+   completion path didn't fire, so `reflect-state.json`'s status
+   was never flipped. The dashboard's `getActiveAgent()` reads
+   that status; it kept claiming reflection was alive.
+3. **No way to kill the stranded subprocess from cfcf.** The
+   existing `cfcf server reap` filters on `PPID==1` (orphans of a
+   dead server). PID 99729 was still parented by the live cfcf
+   server, so it didn't qualify. User had to find the PID
+   manually and `kill -9` it.
+4. **External `cfcf stop` flag was overwritten by post-reflection
+   phase transitions.** The reflection block had no `isStopped`
+   guard between its `await` and the next `state.phase = "deciding"`
+   mutation. When the user `kill -9`'d the stranded subprocess,
+   the await unblocked → post-processing ran → the DECIDE block's
+   phase assignment clobbered `"stopped"` → loop continued. User
+   reported this as "the loop continued after I killed the
+   process" — confirming PA's hypothesis that it was a downstream
+   consequence of Bug A.
+
+The fix lands all four:
+
+**Fix 1 — `stopLoop()` kills the active subprocess.**
+
+`stopLoop()` now maps the loop's current phase to its in-flight
+`AgentRole` via a new pure helper `loopActivePhaseToRole(phase):
+AgentRole | null`:
+
+  - `pre_loop_reviewing` → `architect`
+  - `dev_executing` → `dev`
+  - `judging` → `judge`
+  - `reflecting` → `reflection`
+  - `documenting` → `documenter`
+  - else → `null` (no subprocess in flight; nothing to kill)
+
+For non-null roles, looks up the registered process in
+`active-processes.ts` and calls the existing `killProcessTree()`
+(SIGTERM → 1.5s grace → SIGKILL on the process group).
+
+**Hard guarantee — PA and HA are untouched.** They run interactively
+as `cfcf spec` / `cfcf help assistant` with `stdio: "inherit"`,
+outside the cfcf server entirely. They are NOT in the
+`active-processes` registry, and the registry's `AgentRole` type
+(`packages/core/src/log-storage.ts:20`) doesn't include them.
+Both layers of defence: type-level (can't be in the registry) +
+phase-mapping (the loop never enters a phase that resolves to
+"pa" or "ha").
+
+**Fix 2 — Per-role state-store flip on kill.**
+
+After killing, `stopLoop()` calls the role-appropriate
+`mark<X>StateFailed(workspaceId, reason)` helper:
+
+  - `markReflectStateFailed` (new, in `reflection-runner.ts`)
+  - `markDocumentStateFailed` (new, in `documenter-runner.ts`)
+  - `markReviewStateFailed` (new, in `architect-runner.ts`)
+
+All three are idempotent: no-op when no state exists, no-op when
+state is already terminal (so they're safe to call
+unconditionally, and won't clobber a real completion if the
+subprocess happened to finish between `stopLoop()` being called
+and the kill landing). On a flip, status → `failed`, error →
+`"Loop stopped by user via cfcf stop"`, `completedAt` is set.
+
+The dashboard's `getActiveAgent()` re-reads these state stores; the
+"X running" chip clears on the next poll (~5s).
+
+Dev / judge don't have separate state stores — they're tracked
+inside the iteration's record in `loop-state.json` itself, which
+the "stopped" flip already handles. So those two roles skip the
+state-flip step.
+
+**Fix 3 — Stop signal survives the reflection post-processing cascade.**
+
+A subtler second-order bug surfaced in the same gmbot iter-19 trace.
+After PA helped the user discover the stranded codex, they ran
+`kill -9 99729` — and observed the loop *continuing* afterwards.
+PA's hypothesis was right: it was Bug A's downstream consequence,
+but the chain is specifically:
+
+1. `cfcf stop` set `state.phase = "stopped"` (correct — Fix 1
+   would have killed the subprocess too, but the user was on a
+   pre-fix build).
+2. Reflection's `await runReflectionSync(...)` was still
+   in flight; the codex subprocess was alive.
+3. The user's external `kill -9` killed codex.
+4. The await unblocked. Post-reflection processing ran (archive
+   the analysis, commit, ingest into Clio).
+5. **`state.phase = "deciding"` (line 2283) overwrote the
+   externally-set `"stopped"` flag.**
+6. `makeDecision()` returned its normal verdict (likely
+   "continue").
+7. `case "continue":` returned.
+8. Back in the outer loop, `isLoopDone(state)` checked
+   `state.phase` — now `"deciding"`, not `"stopped"` — and
+   returned false.
+9. **The next iteration of the while-loop started.**
+
+This was a genuine missing guard. The `isStopped(state)` check
+exists after dev's await (iteration-loop.ts:1852) and after judge's
+await (line 2011), but was missing between the reflection block
+and the DECIDE block. Any phase mutation in that window clobbers
+an external stop.
+
+**Fix**: one `if (isStopped(state)) return;` guard right before the
+DECIDE block. The cascade exits cleanly back to the outer loop's
+`isLoopDone(state)` check, which sees the unmodified `"stopped"`
+and returns. Matches the pattern of the dev / judge guards.
+
+This guard also closes the cascade for users with Fix 1 enabled.
+Fix 1 makes `cfcf stop` actively kill the subprocess (which then
+unblocks the await), but the same overwrite-on-post-processing
+cascade still ran without the guard. With Fix 3, the user's
+`cfcf stop` is honoured end-to-end whether the subprocess was
+killed by our `killProcessTree()` (Fix 1) or by an external
+`kill -9` (user escape hatch).
+
+**Fix 4 — New `cfcf agents reap` command (manual escape hatch).**
+
+For the rare cases where the auto-kill in `stopLoop()` doesn't
+land (subprocess in uninterruptible sleep, user observes a stuck
+process from a pre-fix build, etc.), or for proactive inspection:
+
+```
+$ cfcf agents reap                    # all workspaces
+$ cfcf agents reap --workspace gmbot  # one workspace
+$ cfcf agents reap --yes              # non-interactive
+```
+
+Mirrors `cfcf server reap`'s UX (item 6.31) but scoped to
+LIVE-SERVER children rather than `PPID==1` orphans:
+
+  - Requires the cfcf server to be running (queries the in-memory
+    registry).
+  - Lists each process: workspace, role, PID, runtime, log file.
+  - Per-row y/N confirmation by default; `--yes` for scripted
+    use.
+  - Per-row kill via new HTTP routes:
+    - `GET /api/active-processes?workspace=<id>` (list)
+    - `POST /api/active-processes/:workspaceId/:role/kill` (kill +
+      state-flip)
+  - **Cannot kill PA/HA** — they're not in the registry, so they
+    don't show up in the list. Same hard guarantee as `stopLoop()`.
+
+**Why no wall-clock timeout (rejected design).**
+
+PA's initial proposal included a per-role timeout (e.g., kill
+reflection if it runs > 4 hours). Declined deliberately:
+
+  - The existing codex rolling-window quota recovery can take ~1
+    hour. Any wall-clock heuristic risks killing legitimate recovery.
+  - Timeouts add config knobs that are hard to tune correctly.
+  - Matches cf²'s "human on the loop, not in it" principle —
+    the user is the one who decides when something is stuck. The
+    new `cfcf agents reap` is the user-driven version of the
+    same intent.
+
+**Implementation** (~360 LoC):
+
+- `packages/core/src/process-manager.ts` — export
+  `killProcessTree` (was private).
+- `packages/core/src/reflection-runner.ts` —
+  `markReflectStateFailed(workspaceId, reason)`.
+- `packages/core/src/documenter-runner.ts` —
+  `markDocumentStateFailed(workspaceId, reason)`.
+- `packages/core/src/architect-runner.ts` —
+  `markReviewStateFailed(workspaceId, reason)`.
+- `packages/core/src/iteration-loop.ts`:
+  - `loopActivePhaseToRole(phase): AgentRole | null` (exported,
+    pure, testable).
+  - `stopLoop()` extended: capture active role before phase flip,
+    kill subprocess, flip per-role state store.
+  - **New `isStopped(state)` guard** before the DECIDE block
+    (line ~2283) — closes the Bug-D cascade that otherwise
+    clobbered the stop flag on post-reflection phase transitions.
+- `packages/server/src/app.ts` — two new endpoints
+  (`/api/active-processes` GET + POST kill).
+- `packages/cli/src/commands/agents.ts` — new file, `cfcf agents
+  reap` command.
+- `packages/cli/src/index.ts` — register the new command.
+
+**Test coverage** (8 new tests, all 1038 tests pass):
+
+- 6 new tests in `iteration-loop.test.ts` covering
+  `loopActivePhaseToRole`: each of the five active phases maps to
+  the correct role; non-active phases (preparing / deciding /
+  paused / completed / failed / stopped) return null.
+- 2 new tests in `reflection-runner.test.ts` covering
+  `markReflectStateFailed`'s idempotent no-state branch (safe to
+  call unconditionally, doesn't throw for unknown workspace IDs).
+  Document + architect runners share the same code shape; the
+  reflection test locks in the pattern.
+
+### Added — Harness-level missing-signals pause + `retry_iteration` resume action
+
+cfcf has always implicitly assumed that the dev and judge agents
+will run far enough to write their signals files
+(`cfcf-iteration-signals.json` / `cfcf-judge-signals.json`).
+Everything downstream — determination parsing, archive logic,
+decision engine — branches off those files. When an agent exits
+before writing them (hard quota cap, crash, OOM kill, killed
+process, …), cfcf silently marked the iteration "failed" and
+continued to the next one. That cost a real gmbot dogfood run: iter
+10 hit a codex usage limit, dev + judge both exited without signals,
+cfcf moved on to iter 11 which then had to re-discover the missing
+work from scratch.
+
+**Architectural framing**: this is a harness-contract violation, not
+an agent-reasoning failure. The agent never ran far enough to
+classify anything. So the fix lives at the harness layer, not in
+the agent prompts — and deliberately doesn't try to identify the
+root cause. Quota / crash / OOM / killed all collapse to "agent
+exited without signals → can't safely continue → pause." The user
+reads the log file to identify the cause and chooses how to
+recover.
+
+**Behaviour**:
+
+- After `parseSignalFile` / `parseJudgeSignals` returns `null`
+  (missing file, JSON parse failure, schema validation failure —
+  all collapsed to one signal), the loop pauses with
+  `pauseReason: "missing_signals"` rather than continuing silently.
+- The pause sets `pendingQuestions[0]` to a generic anomaly message
+  pointing at the log file: *"Anomaly detected: dev agent for
+  iteration 10 exited (exit code 1) without writing its signals
+  file (agent crashed or hit a usage limit). Check the log: …
+  Resume with `retry_iteration` to redo this iteration, `continue`
+  to skip to iteration 11, or `stop_loop_now` to abandon."*
+- Notification fires (`type: "loop.paused"`) so any notification
+  channel sees the pause immediately, not just the dashboard.
+- Working tree is left as-is — no `git reset`, no commit attempt.
+  Any dirty changes are preserved for the user to inspect.
+
+**New resume action: `retry_iteration`** (extends the structured
+pause-actions vocabulary from item 6.25):
+
+- Rolls the workspace's iteration counter back by one (new
+  `decrementIteration(workspaceId)` helper in `workspaces.ts`).
+- Pops the failed iteration's record from `state.iterations` so
+  it isn't double-counted.
+- Falls through to the regular iteration body, which calls
+  `nextIteration()` (now returns the same number that failed) and
+  proceeds to delete + re-create the existing branch off HEAD
+  (logic the loop already had for retry scenarios). Dev respawns
+  on a clean branch under the same iteration number.
+- Applicable ONLY to `missing_signals` pauses — surfaced in the
+  web UI's pause-action button row + accepted on
+  `cfcf resume <ws> --action retry_iteration`. Rejected for other
+  pause classes via the existing `pauseReasonAllowedActions`
+  validation.
+- `missing_signals` pause-action allowlist: `["retry_iteration",
+  "continue", "stop_loop_now"]`. No finish/refine/consult — those
+  all assume a meaningful iteration result to act on.
+
+**What this doesn't do** (intentional non-scope):
+
+- No root-cause classification (quota vs. crash vs. OOM). The
+  harness doesn't try — that requires reading the log file,
+  which is the user's job. Future enhancement: per-adapter
+  pattern matching that adds a *label* to the pause without
+  changing behaviour.
+- No automatic retry after a parsed reset time. v1 is pause +
+  user resumes when ready. Matches "human on the loop, not in
+  it." Revisit if dogfood demands it.
+- No partial-work recovery beyond "leave the working tree alone."
+  If the agent made dirty changes before crashing, they survive in
+  the working tree; the user can `git status` / `git diff` /
+  `git stash` as needed. On `retry_iteration` the branch is
+  re-created off HEAD; any branch-only commits from the failed
+  attempt are lost. (Working-tree changes are NOT lost.)
+
+**Implementation** (~140 LoC):
+
+- `packages/core/src/types.ts` — extend `ResumeAction` with
+  `"retry_iteration"`.
+- `packages/core/src/iteration-loop.ts`:
+  - Extend `LoopState.pauseReason` with `"missing_signals"`.
+  - New `pauseLoopOnMissingSignals(workspace, state, iter, phase,
+    exitCode, logFile)` helper. Sets phase + pauseReason +
+    pendingQuestions, saves state, flips workspace status,
+    dispatches a `loop.paused` notification.
+  - Two new call sites: after the dev signal-parse (post-exit,
+    pre-commit) and after the judge signal-parse (same shape).
+    Both return immediately — the outer loop's `isLoopDone(state)`
+    check exits the loop cleanly.
+  - New `retry_iteration` branch in the resume-action dispatch
+    chain — decrements the counter, pops the failed iter record,
+    falls through.
+  - Extend `pauseReasonAllowedActions` with the
+    `case "missing_signals"` branch.
+- `packages/core/src/workspaces.ts` — new
+  `decrementIteration(workspaceId)` helper (idempotent at floor 0).
+- `packages/cli/src/commands/resume.ts` — add `retry_iteration`
+  to `RESUME_ACTIONS` + the `--action` help text.
+- `packages/web/src/types.ts` + `packages/web/src/api.ts` — mirror
+  the new pauseReason + ResumeAction variants.
+- `packages/web/src/components/FeedbackForm.tsx` — add the
+  `missing_signals` case to the per-pauseReason action matrix,
+  add `retry_iteration` to `ACTION_LABEL` + `ACTION_HELP`. The
+  existing `pendingQuestions[0]` rendering surfaces the
+  generic-anomaly message we set in the pause helper, so no
+  additional copy was needed.
+
+**Test coverage** (8 new tests, all 1030 tests pass):
+
+- 4 new tests in `iteration-loop.test.ts` covering the
+  `missing_signals` action matrix (retry/continue/stop allowed;
+  finish/refine/consult excluded) and the inverse — `retry_iteration`
+  is rejected for every other pauseReason.
+- 4 new tests in `workspaces.test.ts` for `decrementIteration`:
+  rolls back by 1, decrement+nextIteration round-trips to the
+  failed number (load-bearing invariant for retry), floors at 0,
+  null for non-existent workspace.
+
 ### Fixed — History tab: judge row time cell while judge is pending
 
 F.21 follow-up surfaced during v0.24.2 dogfood. The judge row's
diff --git a/packages/cli/src/commands/agents.ts b/packages/cli/src/commands/agents.ts
new file mode 100644
index 0000000..b934319
--- /dev/null
+++ b/packages/cli/src/commands/agents.ts
@@ -0,0 +1,152 @@
+/**
+ * `cfcf agents` — manage loop-spawned agent processes.
+ *
+ * Today only `reap` is implemented: list + kill subprocesses
+ * registered with the cfcf server. Mirrors the UX of `cfcf server
+ * reap` (item 6.31) but scoped differently:
+ *
+ *   - `cfcf server reap` finds ORPHANS (PPID==1) — children of a
+ *     prior cfcf server that hard-crashed and got reparented to
+ *     init. Works without a running cfcf server.
+ *
+ *   - `cfcf agents reap` lists LIVE-SERVER CHILDREN — what the
+ *     current cfcf server is still tracking. Requires the server
+ *     to be running. Use case: a subprocess survived a `cfcf stop`
+ *     (rare, but the missing-signals follow-up + this command
+ *     close that gap), or the user wants to inspect what's still
+ *     attached to the server.
+ *
+ * **Safety**: this only ever lists / kills processes in the
+ * `active-processes` registry, which is scoped to loop-spawned
+ * agent roles (dev / judge / architect / documenter / reflection).
+ * PA (`cfcf spec`) and HA (`cfcf help assistant`) run interactively
+ * outside the cfcf server (`stdio: "inherit"`), are NOT in the
+ * registry, and CANNOT be killed via this command.
+ */
+
+import type { Command } from "commander";
+import { createInterface } from "node:readline";
+import { isServerReachable, get, post } from "../client.js";
+
+interface ActiveProcessSummary {
+  workspaceId: string;
+  workspaceName: string;
+  role: "dev" | "judge" | "architect" | "documenter" | "reflection";
+  pid: number | undefined;
+  startedAt: string;
+  runtimeMs: number;
+  logFileName: string | null;
+}
+
+function readLine(question: string): Promise<string> {
+  const rl = createInterface({ input: process.stdin, output: process.stdout });
+  return new Promise((resolve) => {
+    rl.question(question, (answer) => {
+      rl.close();
+      resolve(answer.trim());
+    });
+  });
+}
+
+function formatRuntime(ms: number): string {
+  const seconds = Math.floor(ms / 1000);
+  const minutes = Math.floor(seconds / 60);
+  const hours = Math.floor(minutes / 60);
+  if (hours > 0) return `${hours}h${minutes % 60}m`;
+  if (minutes > 0) return `${minutes}m${seconds % 60}s`;
+  return `${seconds}s`;
+}
+
+function formatRow(p: ActiveProcessSummary): string {
+  const pidStr = p.pid !== undefined ? `pid ${p.pid}` : "pid ?";
+  const runtime = formatRuntime(p.runtimeMs);
+  const log = p.logFileName ? ` log=${p.logFileName}` : "";
+  return `  ${p.workspaceName} / ${p.role}: ${pidStr}, running for ${runtime}${log}`;
+}
+
+export function registerAgentsCommands(program: Command): void {
+  const agents = program
+    .command("agents")
+    .description("Manage running agent subprocesses");
+
+  agents
+    .command("reap")
+    .description("List + interactively kill active agent processes (loop-spawned only — PA/HA are untouched)")
+    .option("--workspace <name>", "Limit to a single workspace (by name or ID)")
+    .option("-y, --yes", "Kill without prompting (non-interactive use)")
+    .action(async (opts: { workspace?: string; yes?: boolean }) => {
+      if (!(await isServerReachable())) {
+        console.error(
+          "cfcf server is not running. Start it with: cfcf server start",
+        );
+        console.error(
+          "(For orphans from a previously-crashed server, use `cfcf server reap` instead.)",
+        );
+        process.exit(1);
+      }
+
+      const query = opts.workspace
+        ? `?workspace=${encodeURIComponent(opts.workspace)}`
+        : "";
+      const res = await get<{ active: ActiveProcessSummary[] }>(
+        `/api/active-processes${query}`,
+      );
+      if (!res.ok) {
+        console.error(`Failed to list active processes: ${res.error}`);
+        process.exit(1);
+      }
+
+      const procs = res.data!.active;
+      if (procs.length === 0) {
+        console.log(
+          opts.workspace
+            ? `No active agent processes for workspace "${opts.workspace}".`
+            : "No active agent processes.",
+        );
+        return;
+      }
+
+      console.log(
+        `Found ${procs.length} active agent process${procs.length === 1 ? "" : "es"}:`,
+      );
+      for (const p of procs) {
+        console.log(formatRow(p));
+      }
+      console.log();
+
+      let proceed = false;
+      if (opts.yes) {
+        proceed = true;
+      } else {
+        const answer = await readLine(
+          `Kill ${procs.length === 1 ? "this process" : `these ${procs.length} processes`}? [y/N]: `,
+        );
+        proceed = /^y(es)?$/i.test(answer);
+      }
+      if (!proceed) {
+        console.log("Aborted. No processes killed.");
+        return;
+      }
+
+      let killed = 0;
+      let failed = 0;
+      for (const p of procs) {
+        const r = await post<{ ok: boolean; error?: string }>(
+          `/api/active-processes/${encodeURIComponent(p.workspaceId)}/${p.role}/kill`,
+          {},
+        );
+        if (r.ok && r.data?.ok) {
+          console.log(`  ✓ ${p.workspaceName} / ${p.role} (pid ${p.pid})`);
+          killed++;
+        } else {
+          console.log(
+            `  ✗ ${p.workspaceName} / ${p.role} (pid ${p.pid}): ${r.data?.error ?? r.error ?? "unknown error"}`,
+          );
+          failed++;
+        }
+      }
+      console.log();
+      console.log(`Reap complete: ${killed} signaled, ${failed} failed.`);
+      if (failed > 0) process.exit(1);
+    });
+}
diff --git a/packages/cli/src/commands/resume.ts b/packages/cli/src/commands/resume.ts
index dc10ad2..0f45f6d 100644
--- a/packages/cli/src/commands/resume.ts
+++ b/packages/cli/src/commands/resume.ts
@@ -26,6 +26,11 @@ const RESUME_ACTIONS = [
   "stop_loop_now",
   "refine_plan",
   "consult_reflection",
+  // Re-spawn dev on the same iteration. Typical use: a
+  // `missing_signals` pause (quota cap, crash) is being unblocked
+  // after the underlying cause cleared. The harness rolls back the
+  // iteration counter and re-creates the branch.
+  "retry_iteration",
 ] as const;
 
 type ResumeAction = (typeof RESUME_ACTIONS)[number];
@@ -81,7 +86,7 @@ function buildActionOption(program: Command): Option {
   // eslint-disable-next-line @typescript-eslint/no-explicit-any
   const opt = (program as any).createOption(
     "--action <name>",
-    "Resume action: continue | finish_loop | stop_loop_now | refine_plan | consult_reflection",
+    "Resume action: continue | finish_loop | stop_loop_now | refine_plan | consult_reflection | retry_iteration",
   );
   opt.choices(RESUME_ACTIONS as readonly string[]);
   opt.default("continue");
diff --git a/packages/cli/src/index.ts b/packages/cli/src/index.ts
index 1491518..ef9a815 100644
--- a/packages/cli/src/index.ts
+++ b/packages/cli/src/index.ts
@@ -42,6 +42,7 @@ import { registerSelfUpdateCommand } from "./commands/self-update.js";
 import { registerCompletionCommand } from "./commands/completion.js";
 import { registerHelpCommand } from "./commands/help.js";
 import { registerSpecCommand } from "./commands/spec.js";
+import { registerAgentsCommands } from "./commands/agents.js";
 import { maybePrintUpdateBanner } from "./update-banner.js";
 
 // --- Internal: run the server in-process ---
@@ -114,6 +115,7 @@ registerRunCommand(program);
 registerResumeCommand(program);
 registerStopCommand(program);
 registerSpecCommand(program);
+registerAgentsCommands(program);
 registerReviewCommand(program);
 registerDocumentCommand(program);
 registerReflectCommand(program);
diff --git a/packages/core/src/architect-runner.ts b/packages/core/src/architect-runner.ts
index 187d89f..0c64d9b 100644
--- a/packages/core/src/architect-runner.ts
+++ b/packages/core/src/architect-runner.ts
@@ -83,6 +83,31 @@ export function getReviewState(workspaceId: string): ReviewState | undefined {
   return reviewStore.get(workspaceId);
 }
 
+/**
+ * Force-mark this workspace's architect/review state as `failed`
+ * with the given reason. Used by `stopLoop()` + `cfcf agents reap`
+ * to flip the dashboard's "review running" indicator off
+ * immediately after killing the subprocess. Mirrors
+ * `markReflectStateFailed` / `markDocumentStateFailed`.
+ *
+ * Idempotent: no-op when no state exists, or when the state is
+ * already terminal (completed / failed). Returns true if a flip
+ * happened.
+ */
+export async function markReviewStateFailed(
+  workspaceId: string,
+  reason: string,
+): Promise<boolean> {
+  const current = reviewStore.get(workspaceId);
+  if (!current) return false;
+  if (!REVIEW_ACTIVE_STATUSES.has(current.status)) return false;
+  current.status = "failed";
+  current.error = reason;
+  current.completedAt = new Date().toISOString();
+  await setReviewState(current);
+  return true;
+}
+
 /**
  * Server-boot hook (item F.23): load every workspace's persisted
  * review state into the in-memory cache. Any state still in an active
diff --git a/packages/core/src/documenter-runner.ts b/packages/core/src/documenter-runner.ts
index cf7ae55..d134896 100644
--- a/packages/core/src/documenter-runner.ts
+++ b/packages/core/src/documenter-runner.ts
@@ -82,6 +82,31 @@ export function getDocumentState(workspaceId: string): DocumentState | undefined
   return documentStore.get(workspaceId);
 }
 
+/**
+ * Force-mark this workspace's documenter state as `failed` with the
+ * given reason. Used by `stopLoop()` + `cfcf agents reap` to flip
+ * the dashboard's "documenting" indicator off immediately after
+ * killing the subprocess. Mirrors `markReflectStateFailed` /
+ * `markReviewStateFailed`.
+ *
+ * Idempotent: no-op when no state exists, or when the state is
+ * already terminal (completed / failed). Returns true if a flip
+ * happened.
+ */
+export async function markDocumentStateFailed(
+  workspaceId: string,
+  reason: string,
+): Promise<boolean> {
+  const current = documentStore.get(workspaceId);
+  if (!current) return false;
+  if (!DOCUMENT_ACTIVE_STATUSES.has(current.status)) return false;
+  current.status = "failed";
+  current.error = reason;
+  current.completedAt = new Date().toISOString();
+  await setDocumentState(current);
+  return true;
+}
+
 /**
  * Server-boot hook (item F.23): hydrate the in-memory cache from disk
  * and clean up any state still claiming to be active. Returns the
diff --git a/packages/core/src/iteration-loop.test.ts b/packages/core/src/iteration-loop.test.ts
index 0651843..632b096 100644
--- a/packages/core/src/iteration-loop.test.ts
+++ b/packages/core/src/iteration-loop.test.ts
@@ -13,6 +13,7 @@ import {
   stopLoop,
   shouldRunReflection,
   resolveEffectiveDetermination,
+  loopActivePhaseToRole,
   type LoopState,
 } from "./iteration-loop.js";
 import type { WorkspaceConfig, DevSignals, JudgeSignals, ReflectionSignals } from "./types.js";
@@ -947,6 +948,47 @@ describe("pauseReasonAllowedActions (item 6.25)", () => {
   test("scope_complete excludes consult_reflection (no iterations to reflect on)", () => {
     expect(pauseReasonAllowedActions("scope_complete")).not.toContain("consult_reflection");
   });
+
+  // harness-missing-signals: when dev or judge exit without writing
+  // their signals file, the iteration is in an unknown state. The
+  // user's three meaningful options are: redo the iter (retry),
+  // skip it (continue), or abandon (stop). finish_loop /
+  // refine_plan / consult_reflection all assume we have a
+  // meaningful iter result to act on; we don't.
+  test("missing_signals: retry_iteration + continue + stop_loop_now", () => {
+    expectActions(pauseReasonAllowedActions("missing_signals"), [
+      "retry_iteration",
+      "continue",
+      "stop_loop_now",
+    ]);
+  });
+
+  test("missing_signals excludes finish_loop (no iter result to finish on)", () => {
+    expect(pauseReasonAllowedActions("missing_signals")).not.toContain("finish_loop");
+  });
+
+  test("missing_signals excludes refine_plan + consult_reflection (no iter data to refine/reflect on)", () => {
+    expect(pauseReasonAllowedActions("missing_signals")).not.toContain("refine_plan");
+    expect(pauseReasonAllowedActions("missing_signals")).not.toContain("consult_reflection");
+  });
+
+  test("retry_iteration is ONLY applicable to missing_signals (not offered for other pauseReasons)", () => {
+    // The retry action is specifically the "agent exited without
+    // signals, redo the iter" path. It doesn't make sense for the
+    // other pause classes (cadence, anomaly with signals, etc.
+    // — those have a known iter result and use other actions).
+    const otherReasons = [
+      undefined,
+      "cadence",
+      "anomaly",
+      "user_input_needed",
+      "max_iterations",
+      "scope_complete",
+    ] as const;
+    for (const reason of otherReasons) {
+      expect(pauseReasonAllowedActions(reason)).not.toContain("retry_iteration");
+    }
+  });
 });
 
 // 2026-05-02: Architect SCOPE_COMPLETE readiness verdict (item 6.25 follow-up).
@@ -1007,3 +1049,41 @@ describe("buildPreLoopBlockReason: SCOPE_COMPLETE message", () => {
     expect(msg).not.toContain("already implemented");
   });
 });
+
+
+// --- loopActivePhaseToRole (kill-on-stop feature) ---
+//
+// Maps the loop's `state.phase` to the AgentRole currently being
+// awaited, so stopLoop / `cfcf agents reap` know which subprocess
+// to signal. PA / HA are deliberately omitted — they run outside
+// the cfcf server (`stdio: "inherit"`) and are never in the
+// active-processes registry.
+
+describe("loopActivePhaseToRole", () => {
+  test("pre_loop_reviewing → architect", () => {
+    expect(loopActivePhaseToRole("pre_loop_reviewing")).toBe("architect");
+  });
+  test("dev_executing → dev", () => {
+    expect(loopActivePhaseToRole("dev_executing")).toBe("dev");
+  });
+  test("judging → judge", () => {
+    expect(loopActivePhaseToRole("judging")).toBe("judge");
+  });
+  test("reflecting → reflection (the iter-19 gmbot case)", () => {
+    expect(loopActivePhaseToRole("reflecting")).toBe("reflection");
+  });
+  test("documenting → documenter", () => {
+    expect(loopActivePhaseToRole("documenting")).toBe("documenter");
+  });
+  test("non-active phases return null (nothing to kill)", () => {
+    // preparing / deciding / paused / completed / failed / stopped:
+    // no subprocess is in flight, so stopLoop's kill loop should
+    // skip cleanly.
+    expect(loopActivePhaseToRole("preparing")).toBeNull();
+    expect(loopActivePhaseToRole("deciding")).toBeNull();
+    expect(loopActivePhaseToRole("paused")).toBeNull();
+    expect(loopActivePhaseToRole("completed")).toBeNull();
+    expect(loopActivePhaseToRole("failed")).toBeNull();
+    expect(loopActivePhaseToRole("stopped")).toBeNull();
+  });
+});
diff --git a/packages/core/src/iteration-loop.ts b/packages/core/src/iteration-loop.ts
index 2764fca..4944ed2 100644
--- a/packages/core/src/iteration-loop.ts
+++ b/packages/core/src/iteration-loop.ts
@@ -33,7 +33,7 @@ import { getAdapter } from "./adapters/index.js";
 import { spawnProcess } from "./process-manager.js";
 import { getIterationLogPath, ensureWorkspaceLogDir } from "./log-storage.js";
 import * as gitManager from "./git-manager.js";
-import { nextIteration, updateWorkspace, getWorkspace } from "./workspaces.js";
+import { nextIteration, decrementIteration, updateWorkspace, getWorkspace } from "./workspaces.js";
 import { runDocumentSync } from "./documenter-runner.js";
 import { runReflectionSync, parseReflectionSignals } from "./reflection-runner.js";
 import { runReviewSync, readinessGateBlocks } from "./architect-runner.js";
@@ -88,7 +88,14 @@ export interface LoopState {
     | "anomaly"
     | "user_input_needed"
     | "max_iterations"
-    | "scope_complete";  // item 6.25 follow-up: architect SCOPE_COMPLETE
+    | "scope_complete"  // item 6.25 follow-up: architect SCOPE_COMPLETE
+    // Harness contract violation: dev or judge agent exited without
+    // writing its signals file. Inclusive of all root causes (crash,
+    // quota cap, OOM, etc.) — the harness doesn't classify, just
+    // pauses + surfaces. User checks the log to identify cause and
+    // resumes with `retry_iteration` (redo the iter) or `continue`
+    // (skip + move on) or `stop_loop_now`.
+    | "missing_signals";
   /** Questions from dev/judge that need user answers */
   pendingQuestions?: string[];
   /** User feedback to inject into the next iteration */
@@ -753,6 +760,21 @@ export function pauseReasonAllowedActions(
       //   - refine_plan  → user adds new requirements, re-runs review
       return ["finish_loop", "stop_loop_now", "refine_plan"];
 
+    case "missing_signals":
+      // Dev or judge agent exited without writing signals (crash,
+      // quota cap, OOM, …). The iteration is in an unknown state;
+      // the harness can't safely judge, reflect, refine, or finish
+      // on no data. Allowed:
+      //   - retry_iteration → redo the failed iter (counter rolled
+      //     back; branch re-created). Natural choice after a quota
+      //     cap resets.
+      //   - continue        → skip the failed iter, start iter N+1
+      //     fresh. Use when the failed work isn't worth recovering.
+      //   - stop_loop_now   → abandon, accept partial progress.
+      // No finish_loop / refine_plan / consult_reflection — those
+      // all assume we have a meaningful iteration result to act on.
+      return ["retry_iteration", "continue", "stop_loop_now"];
+
     default:
       // A1 pre-loop review blocked. continue (after user edited the
       // Problem Pack), stop_loop_now, refine_plan (re-run architect
@@ -827,6 +849,39 @@ export async function resumeLoop(
 /**
  * Stop a running or paused loop.
  */
+/**
+ * Map the loop's current phase to the AgentRole of the subprocess
+ * the harness is awaiting RIGHT NOW. Returns `null` when no
+ * subprocess is in flight (preparing / deciding / paused / etc.).
+ *
+ * Used by `stopLoop()` (and any future "kill the currently-running
+ * thing" callers) to decide what to signal. The loop awaits exactly
+ * one subprocess at a time, so a single role is unambiguous.
+ *
+ * Deliberately omits PA/HA — those run outside the cfcf server
+ * entirely (interactive `cfcf spec` / `cfcf help assistant` with
+ * `stdio: "inherit"`), are NOT in the `active-processes` registry,
+ * and must NOT be touched by a loop-level stop.
+ */
+export function loopActivePhaseToRole(
+  phase: LoopState["phase"],
+): import("./log-storage.js").AgentRole | null {
+  switch (phase) {
+    case "pre_loop_reviewing":
+      return "architect";
+    case "dev_executing":
+      return "dev";
+    case "judging":
+      return "judge";
+    case "reflecting":
+      return "reflection";
+    case "documenting":
+      return "documenter";
+    default:
+      return null;
+  }
+}
+
 export async function stopLoop(workspaceId: string): Promise<LoopState> {
   const state = await getLoopState(workspaceId);
   if (!state) {
@@ -836,12 +891,69 @@ export async function stopLoop(workspaceId: string): Promise<LoopState> {
     throw new Error(`Loop already ended (phase: ${state.phase})`);
   }
 
+  // Capture the active role BEFORE flipping phase to "stopped" —
+  // the flip would otherwise hide which subprocess we should signal.
+  const activeRole = loopActivePhaseToRole(state.phase);
+
   state.phase = "stopped";
   state.outcome = "stopped";
   state.completedAt = new Date().toISOString();
   await saveLoopState(state);
   await updateWorkspace(workspaceId, { status: "stopped" });
 
+  // Kill the subprocess the loop was awaiting + flip its per-role
+  // state-store row so the dashboard's "X running" indicator clears
+  // immediately rather than staying stuck until next server boot.
+  //
+  // Previously `stopLoop()` only flipped the state flag; mid-flight
+  // subprocesses (notably reflection codex on a multi-hour stall)
+  // kept running indefinitely because nothing signaled them. The
+  // loop's `isStopped(state)` check exits the while-loop at the next
+  // iteration boundary, but a subprocess that started before stop
+  // is still being awaited at that point — the await promise gets
+  // stranded and the subprocess outlives the server's interest in
+  // it. Real dogfood: gmbot iter 19 reflection ran ~5 hours past
+  // a `cfcf stop` because of this.
+  if (activeRole) {
+    const { getActiveProcess } = await import("./active-processes.js");
+    const entry = getActiveProcess(workspaceId, activeRole);
+    if (entry) {
+      try {
+        const { killProcessTree } = await import("./process-manager.js");
+        killProcessTree(entry.process.proc.pid);
+      } catch (err) {
+        console.warn(
+          `[stopLoop] killProcessTree failed for ${activeRole} (pid ${entry.process.proc.pid}): ${err instanceof Error ? err.message : String(err)}`,
+        );
+      }
+    }
+    // Flip the per-role state-store. Each runner exposes a
+    // `mark<X>StateFailed` helper that's idempotent — no-op when
+    // there's no state (dev/judge don't have one) or when the
+    // state is already terminal (e.g., the subprocess died on its
+    // own between stop being called and the kill landing).
+    const reason = "Loop stopped by user via cfcf stop";
+    try {
+      if (activeRole === "reflection") {
+        const { markReflectStateFailed } = await import("./reflection-runner.js");
+        await markReflectStateFailed(workspaceId, reason);
+      } else if (activeRole === "documenter") {
+        const { markDocumentStateFailed } = await import("./documenter-runner.js");
+        await markDocumentStateFailed(workspaceId, reason);
+      } else if (activeRole === "architect") {
+        const { markReviewStateFailed } = await import("./architect-runner.js");
+        await markReviewStateFailed(workspaceId, reason);
+      }
+      // dev / judge: no separate state-store. The loop-state itself
+      // (now "stopped") is the audit trail; their iterRecord stays
+      // in state.iterations as a partial entry.
+    } catch (err) {
+      console.warn(
+        `[stopLoop] state-store flip failed for ${activeRole}: ${err instanceof Error ? err.message : String(err)}`,
+      );
+    }
+  }
+
   return state;
 }
 
@@ -1052,6 +1164,70 @@ async function handleStopLoopNow(
   );
 }
 
+/**
+ * Pause the loop because the dev or judge agent exited without
+ * writing its signals file. Harness contract: agents must produce
+ * `cfcf-iteration-signals.json` / `cfcf-judge-signals.json`. When
+ * the file is missing, unreadable, or fails schema validation, the
+ * harness can't safely judge / reflect / decide on no data — pause
+ * and surface to the user.
+ *
+ * Inclusive of all root causes (quota cap, agent crash, OOM, killed
+ * process, agent CLI bug). The harness deliberately doesn't try to
+ * classify — the user reads the log file to identify cause, then
+ * resumes with `retry_iteration` (most common after a quota cap
+ * resets) / `continue` (skip the failed iter) / `stop_loop_now`.
+ *
+ * The pause-state writes mirror the existing
+ * `consult_reflection`/`scope_complete` pause paths: set phase +
+ * pauseReason + pendingQuestions, saveLoopState, flip workspace
+ * status to "paused", dispatch a notification. No git surgery —
+ * any dirty working-tree state is preserved for the user to
+ * inspect; on `retry_iteration`, the iteration counter is rolled
+ * back and the existing branch is re-created off HEAD (the normal
+ * iteration body already deletes-and-recreates an existing branch).
+ */
+async function pauseLoopOnMissingSignals(
+  workspace: WorkspaceConfig,
+  state: LoopState,
+  iterationNum: number,
+  phase: "dev" | "judge",
+  exitCode: number | undefined,
+  logFileName: string,
+): Promise<void> {
+  state.phase = "paused";
+  state.pauseReason = "missing_signals";
+  const exitDetail =
+    exitCode === undefined ? "" : ` (exit code ${exitCode})`;
+  const message =
+    `Anomaly detected: ${phase} agent for iteration ${iterationNum} ` +
+    `exited${exitDetail} without writing its signals file (agent crashed ` +
+    `or hit a usage limit). Check the log: ${logFileName}. Resume with ` +
+    `\`retry_iteration\` to redo this iteration, \`continue\` to skip ` +
+    `to iteration ${iterationNum + 1}, or \`stop_loop_now\` to abandon.`;
+  state.pendingQuestions = [message];
+  await saveLoopState(state);
+  await updateWorkspace(workspace.id, { status: "paused" });
+
+  dispatchForWorkspace(
+    makeEvent({
+      type: "loop.paused",
+      title: "Loop paused: agent exited without signals",
+      message: `${workspace.name}: ${message}`,
+      workspaceId: workspace.id,
+      workspaceName: workspace.name,
+      details: {
+        pauseReason: "missing_signals",
+        phase,
+        iteration: iterationNum,
+        exitCode: exitCode ?? null,
+        logFile: logFileName,
+      },
+    }),
+    workspace.notifications,
+  );
+}
+
 /**
  * Handle the `finish_loop` resume action: skip the iteration spawn,
  * jump to the configured end-of-loop sequence (documenter when
@@ -1246,6 +1422,22 @@ async function runLoop(
     // is handled inside the consult helper, which sets the pause state
     // and returns "stop" to abort runLoop).
   }
+  if (resumeAction === "retry_iteration") {
+    // Re-spawn dev on the same iteration after a `missing_signals`
+    // pause (quota cap reset, agent crash recovery, etc.). Roll the
+    // iteration counter back by one so the loop body's next
+    // `nextIteration()` call returns the SAME number that failed —
+    // the branch-creation logic already deletes + re-creates an
+    // existing branch off HEAD, so the failed attempt's branch is
+    // replaced cleanly. Also drop the failed iteration's record
+    // from state.iterations so it isn't double-counted; the retry
+    // produces a fresh record under the same number.
+    await decrementIteration(workspace.id);
+    if (state.iterations.length > 0) {
+      state.iterations.pop();
+    }
+    // Fall through to the regular pre-loop / iteration logic.
+  }
 
   // --- PRE-LOOP REVIEW (item 5.1 autoReviewSpecs=true) ---
   // Runs before iteration 1, not on resume. When the readiness gate
@@ -1664,6 +1856,29 @@ async function runLoop(
     const devSignals = await parseSignalFile(workspace.repoPath);
     iterRecord.devSignals = devSignals ?? undefined;
 
+    // Harness contract check: dev agent must write
+    // `cfcf-iteration-signals.json`. Absence (file missing, JSON
+    // parse failure, or schema validation failure — all collapsed
+    // to `null` by `parseSignalFile`) means we have no data to
+    // judge / reflect / decide on. Pause + surface to the user;
+    // do NOT silently treat as a failed iteration and move on.
+    // Inclusive of all root causes: agent crashed mid-startup,
+    // quota cap, OOM kill, etc. Working tree (if dirty) is left
+    // as-is for the user to inspect; on `retry_iteration` the
+    // existing branch is deleted + re-created off HEAD by the
+    // normal iteration body.
+    if (!devSignals) {
+      await pauseLoopOnMissingSignals(
+        workspace,
+        state,
+        iterationNum,
+        "dev",
+        devResult.exitCode,
+        iterRecord.devLogFileName,
+      );
+      return;
+    }
+
     // Commit dev work
     if (await gitManager.hasChanges(workspace.repoPath)) {
       await gitManager.commitAll(
@@ -1799,9 +2014,27 @@ async function runJudgeAndDecide(
   const judgeSignals = await parseJudgeSignals(workspace.repoPath);
   iterRecord.judgeSignals = judgeSignals ?? undefined;
 
-  // If judge exited with non-zero and produced no signals, log it
-  if (judgeResult.exitCode !== 0 && !judgeSignals) {
-    iterRecord.judgeError = `Judge agent exited with code ${judgeResult.exitCode}. Check log: ${judgeLogFile}`;
+  // Harness contract check: judge agent must write
+  // `cfcf-judge-signals.json`. Same pattern as the dev check above
+  // — absence of signals means we have no determination to act on,
+  // so we pause and surface to the user rather than archive a
+  // stale judge-assessment.md or fall through to the decide path
+  // with no data. Inclusive of all root causes (crash, quota cap,
+  // OOM). The previous behaviour (record `iterRecord.judgeError`
+  // and continue silently) is replaced by the explicit pause; the
+  // `judgeError` field is still populated for the audit trail.
+  if (!judgeSignals) {
+    iterRecord.judgeError =
+      `Judge agent exited with code ${judgeResult.exitCode} without writing signals. Check log: ${judgeLogFile}`;
+    await pauseLoopOnMissingSignals(
+      workspace,
+      state,
+      iterationNum,
+      "judge",
+      judgeResult.exitCode,
+      iterRecord.judgeLogFileName,
+    );
+    return;
   }
 
   // Commit judge work
@@ -2046,6 +2279,22 @@ async function runJudgeAndDecide(
     );
   }
 
+  // External-stop guard. Mirrors the `isStopped(state)` checks after
+  // dev's await (line ~1852) and judge's await (line ~2011) — the
+  // reflection path was missing it, and any phase mutation here
+  // would clobber a `state.phase = "stopped"` that `stopLoop()` set
+  // while we were mid-reflection. Real dogfood: gmbot iter 19 — user
+  // called `cfcf stop`, then `kill -9`'d the stranded reflection
+  // codex; the await unblocked, post-reflection ingest ran, line
+  // 2283 below overwrote `"stopped"` with `"deciding"`, makeDecision
+  // returned `continue`, and the outer loop's `isLoopDone(state)`
+  // saw `"deciding"` → false → spawned iter N+1. With this guard,
+  // the cascade exits cleanly back to runLoop's `isLoopDone(state)`
+  // check, which sees the unmodified `"stopped"` and returns.
+  if (isStopped(state)) {
+    return;
+  }
+
   // --- DECIDE ---
   state.phase = "deciding";
   await saveLoopState(state);
diff --git a/packages/core/src/process-manager.ts b/packages/core/src/process-manager.ts
index c95c65e..705a779 100644
--- a/packages/core/src/process-manager.ts
+++ b/packages/core/src/process-manager.ts
@@ -159,7 +159,7 @@ export async function spawnProcess(opts: ProcessOptions): Promise<ManagedProcess
  *
  * Item 6.31 (2026-05-08).
  */
-function killProcessTree(pid: number | undefined): void {
+export function killProcessTree(pid: number | undefined): void {
   if (!pid) return;
   try {
     // SIGTERM the whole group. Negative pid = group target on Unix.
diff --git a/packages/core/src/reflection-runner.test.ts b/packages/core/src/reflection-runner.test.ts
index 4090b9d..1b520ca 100644
--- a/packages/core/src/reflection-runner.test.ts
+++ b/packages/core/src/reflection-runner.test.ts
@@ -8,6 +8,7 @@ import {
   writeReflectionInstructions,
   validatePlanRewrite,
   resolveReflectionAgent,
+  markReflectStateFailed,
 } from "./reflection-runner.js";
 import type { WorkspaceConfig } from "./types.js";
 
@@ -204,4 +205,25 @@ describe("reflection-runner", () => {
       expect(validatePlanRewrite(basePlan, newPlan)).toEqual({ valid: true });
     });
   });
+
+  // markReflectStateFailed: used by stopLoop + cfcf agents reap to
+  // flip the dashboard's "reflection running" indicator off after
+  // killing the subprocess. The "with-state" branch is exercised
+  // end-to-end via stopLoop integration tests; here we lock in the
+  // idempotent / safe no-state path so callers can call it
+  // unconditionally without worrying about whether state exists.
+  describe("markReflectStateFailed (no-state branch — idempotency)", () => {
+    it("returns false when the workspace has no reflect state (no-op, safe to call unconditionally)", async () => {
+      const result = await markReflectStateFailed("workspace-id-with-no-state-xyz", "test reason");
+      expect(result).toBe(false);
+    });
+
+    it("does not throw for unknown workspace ids", async () => {
+      // Callers in stopLoop wrap in try/catch but rely on this not
+      // throwing for the common case (no reflection ever spawned).
+      await expect(
+        markReflectStateFailed("totally-fake-id", "test reason"),
+      ).resolves.toBe(false);
+    });
+  });
 });
diff --git a/packages/core/src/reflection-runner.ts b/packages/core/src/reflection-runner.ts
index bee403b..9052963 100644
--- a/packages/core/src/reflection-runner.ts
+++ b/packages/core/src/reflection-runner.ts
@@ -104,6 +104,34 @@ export function getReflectState(workspaceId: string): ReflectState | undefined {
   return reflectStore.get(workspaceId);
 }
 
+/**
+ * Force-mark this workspace's reflection state as `failed` with the
+ * given reason. Used by `stopLoop()` (and the manual
+ * `cfcf agents reap` command) to flip the dashboard's "reflection
+ * running" indicator off immediately after killing the subprocess —
+ * without this, the in-memory state would stay "executing" and the
+ * `/api/activity` chip would keep claiming reflection is alive.
+ *
+ * Idempotent: no-op when the workspace has no reflect state, or
+ * when the existing state is already a terminal status (completed /
+ * failed) — won't clobber a real completion.
+ *
+ * Returns true if a flip happened, false otherwise.
+ */
+export async function markReflectStateFailed(
+  workspaceId: string,
+  reason: string,
+): Promise<boolean> {
+  const current = reflectStore.get(workspaceId);
+  if (!current) return false;
+  if (!REFLECT_ACTIVE_STATUSES.has(current.status)) return false;
+  current.status = "failed";
+  current.error = reason;
+  current.completedAt = new Date().toISOString();
+  await setReflectState(current);
+  return true;
+}
+
 /**
  * Server-boot hook (item F.23): hydrate the in-memory cache from disk
  * and clean up any state still claiming to be active.
diff --git a/packages/core/src/types.ts b/packages/core/src/types.ts
index 42c234f..9b7dfe7 100644
--- a/packages/core/src/types.ts
+++ b/packages/core/src/types.ts
@@ -585,7 +585,14 @@ export type ResumeAction =
   | "finish_loop"
   | "stop_loop_now"
   | "refine_plan"
-  | "consult_reflection";
+  | "consult_reflection"
+  // Re-spawn dev on the same iteration after a pause (typically
+  // following a `missing_signals` pause from a quota cap or agent
+  // crash). The iteration counter is rolled back so the retry runs
+  // as iteration N rather than N+1; the existing branch is
+  // discarded + re-created off HEAD. See harness-missing-signals
+  // feature for the full flow.
+  | "retry_iteration";
 
 /**
  * Architect's verdict on the Problem Pack's readiness for an iteration loop.
diff --git a/packages/core/src/workspaces.test.ts b/packages/core/src/workspaces.test.ts
index 68a8ae9..124e5f2 100644
--- a/packages/core/src/workspaces.test.ts
+++ b/packages/core/src/workspaces.test.ts
@@ -11,6 +11,7 @@ import {
   deleteWorkspace,
   validateWorkspaceRepo,
   nextIteration,
+  decrementIteration,
 } from "./workspaces.js";
 
 describe("projects", () => {
@@ -166,6 +167,49 @@ describe("projects", () => {
     });
   });
 
+  // Roll-back helper for `retry_iteration` (harness-missing-signals
+  // feature): when a failed iteration N is being retried, the
+  // counter is rolled back so the next `nextIteration()` call
+  // returns N again instead of N+1.
+  describe("decrementIteration", () => {
+    it("rolls back from 3 to 2", async () => {
+      const project = await createWorkspace({ name: "dec-test", repoPath: repoDir });
+      await nextIteration(project.id);
+      await nextIteration(project.id);
+      await nextIteration(project.id);
+      const prev = await decrementIteration(project.id);
+      expect(prev).toBe(2);
+
+      const updated = await getWorkspace(project.id);
+      expect(updated!.currentIteration).toBe(2);
+    });
+
+    it("decrement then nextIteration returns the same number that was rolled back", async () => {
+      // The retry_iteration flow's load-bearing invariant: the
+      // counter ends up back at the failed iter's number so the
+      // loop body's next `nextIteration()` call reproduces it.
+      const project = await createWorkspace({ name: "retry-test", repoPath: repoDir });
+      await nextIteration(project.id); // -> 1
+      await nextIteration(project.id); // -> 2 (this is the "failed" iter)
+
+      await decrementIteration(project.id);
+      const retried = await nextIteration(project.id);
+      expect(retried).toBe(2); // same number as the failed attempt
+    });
+
+    it("floors at 0 (calling on a 0-counter is a no-op)", async () => {
+      const project = await createWorkspace({ name: "floor-test", repoPath: repoDir });
+      const result = await decrementIteration(project.id);
+      expect(result).toBe(0);
+      const updated = await getWorkspace(project.id);
+      expect(updated!.currentIteration).toBe(0);
+    });
+
+    it("returns null for non-existent project (matches nextIteration's nullable shape)", async () => {
+      expect(await decrementIteration("nonexistent")).toBeNull();
+    });
+  });
+
   describe("validateWorkspaceRepo", () => {
     it("validates a valid git repo", async () => {
       const result = await validateWorkspaceRepo(repoDir);
diff --git a/packages/core/src/workspaces.ts b/packages/core/src/workspaces.ts
index 300de98..e4ea95e 100644
--- a/packages/core/src/workspaces.ts
+++ b/packages/core/src/workspaces.ts
@@ -243,6 +243,25 @@ export async function nextIteration(workspaceId: string): Promise<number | null>
   return next;
 }
 
+/**
+ * Roll the iteration counter back by one. Used by `retry_iteration`
+ * (the resume action that re-spawns dev on the same iteration after a
+ * `missing_signals` pause): the failed attempt was already counted via
+ * `nextIteration()`, so we roll back to let the loop body's next
+ * `nextIteration()` call return the same number. Idempotent at the
+ * floor — if the counter is already 0 (no iterations yet), this is a
+ * no-op.
+ */
+export async function decrementIteration(workspaceId: string): Promise<number | null> {
+  const workspace = await getWorkspace(workspaceId);
+  if (!workspace) return null;
+  const current = workspace.currentIteration || 0;
+  if (current <= 0) return 0;
+  const prev = current - 1;
+  await updateWorkspace(workspaceId, { currentIteration: prev });
+  return prev;
+}
+
 /**
  * Verify a workspace's repo path exists and is a git repo.
  */
diff --git a/packages/server/src/app.ts b/packages/server/src/app.ts
index 0518a62..40374b9 100644
--- a/packages/server/src/app.ts
+++ b/packages/server/src/app.ts
@@ -49,6 +49,13 @@ import {
   startReflection,
   getReflectState,
   stopReflection,
+  getActiveProcessesForWorkspace,
+  getAllActiveProcesses,
+  getActiveProcess,
+  killProcessTree,
+  markReflectStateFailed,
+  markDocumentStateFailed,
+  markReviewStateFailed,
 } from "@cfcf/core";
 
 const startedAt = Date.now();
@@ -135,6 +142,82 @@ export function createApp() {
     return c.json({ active: items });
   });
 
+  // --- Active loop-spawned agent processes (for `cfcf agents reap`) ---
+  //
+  // List + kill subprocess registry. Distinct from `/api/activity`:
+  // /activity reads state files (loop-state, reflect-state, …) and
+  // is the dashboard's source of truth for "what's running"; this
+  // endpoint exposes the in-memory PID registry (`active-processes.ts`)
+  // so the user can identify and kill a stranded subprocess when
+  // the higher-level stop didn't (or to clean up after a kill that
+  // didn't reach the state-store flip path).
+  //
+  // PA / HA are intentionally absent — they run as `cfcf spec` /
+  // `cfcf help assistant` with `stdio: "inherit"` outside the cfcf
+  // server, are not in the registry, and cannot be killed via these
+  // endpoints. Safe by construction.
+  app.get("/api/active-processes", async (c) => {
+    const workspaceFilter = c.req.query("workspace");
+    const all = workspaceFilter
+      ? getActiveProcessesForWorkspace(workspaceFilter)
+      : getAllActiveProcesses();
+    const workspaces = await listWorkspaces();
+    const wsById = new Map(workspaces.map((w) => [w.id, w]));
+    const now = Date.now();
+    const summary = all.map((e) => ({
+      workspaceId: e.workspaceId,
+      workspaceName: wsById.get(e.workspaceId)?.name ?? e.workspaceId,
+      role: e.role,
+      pid: e.process.proc.pid,
+      startedAt: e.startedAt,
+      runtimeMs: Math.max(0, now - new Date(e.startedAt).getTime()),
+      logFileName: e.logFileName ?? null,
+    }));
+    return c.json({ active: summary });
+  });
+
+  app.post("/api/active-processes/:workspaceId/:role/kill", async (c) => {
+    const workspaceId = c.req.param("workspaceId");
+    const role = c.req.param("role") as
+      | "dev" | "judge" | "architect" | "documenter" | "reflection";
+    const validRoles = new Set([
+      "dev", "judge", "architect", "documenter", "reflection",
+    ]);
+    if (!validRoles.has(role)) {
+      return c.json({ ok: false, error: `Unknown role: ${role}` }, 400);
+    }
+    const entry = getActiveProcess(workspaceId, role);
+    if (!entry) {
+      return c.json(
+        { ok: false, error: `No active ${role} process for workspace ${workspaceId}` },
+        404,
+      );
+    }
+    const pid = entry.process.proc.pid;
+    try {
+      killProcessTree(pid);
+    } catch (err) {
+      return c.json(
+        { ok: false, error: `killProcessTree failed: ${err instanceof Error ? err.message : String(err)}` },
+        500,
+      );
+    }
+    // Flip per-role state-store row so the dashboard clears
+    // immediately. Idempotent; dev/judge have no state-store so
+    // skip them.
+    const reason = "Killed by user via cfcf agents reap";
+    try {
+      if (role === "reflection") await markReflectStateFailed(workspaceId, reason);
+      else if (role === "documenter") await markDocumentStateFailed(workspaceId, reason);
+      else if (role === "architect") await markReviewStateFailed(workspaceId, reason);
+    } catch (err) {
+      console.warn(
+        `[reap] state-store flip failed for ${role}: ${err instanceof Error ? err.message : String(err)}`,
+      );
+    }
+    return c.json({ ok: true, killed: true, pid, role, workspaceId });
+  });
+
   app.get("/api/status", async (c) => {
     const hasConfig = await configExists();
     const config = hasConfig ? await readConfig() : null;
diff --git a/packages/web/src/api.ts b/packages/web/src/api.ts
index d1683e6..875cf97 100644
--- a/packages/web/src/api.ts
+++ b/packages/web/src/api.ts
@@ -302,7 +302,8 @@ export type ResumeAction =
   | "finish_loop"
   | "stop_loop_now"
   | "refine_plan"
-  | "consult_reflection";
+  | "consult_reflection"
+  | "retry_iteration";  // harness-missing-signals: redo the failed iter
 
 export function resumeLoop(
   workspaceId: string,
diff --git a/packages/web/src/components/FeedbackForm.tsx b/packages/web/src/components/FeedbackForm.tsx
index 16bd4b0..13081c3 100644
--- a/packages/web/src/components/FeedbackForm.tsx
+++ b/packages/web/src/components/FeedbackForm.tsx
@@ -32,6 +32,17 @@ function pauseReasonAllowedActions(pauseReason: LoopState["pauseReason"]): Resum
       // configured; stop_loop_now accepts "project done"; refine_plan
       // re-runs the architect after the user adds new requirements.
       return ["finish_loop", "stop_loop_now", "refine_plan"];
+    case "missing_signals":
+      // Harness-missing-signals: dev/judge exited without writing
+      // its signals file (crash, quota cap, OOM, …). No determination
+      // to act on. Allowed:
+      //   - retry_iteration → redo the iter (after a quota window
+      //     resets, typically)
+      //   - continue        → skip the failed iter
+      //   - stop_loop_now   → abandon
+      // No finish/refine/consult — they all need a meaningful iter
+      // result.
+      return ["retry_iteration", "continue", "stop_loop_now"];
     default:
       // Pre-loop review block (A1)
       return ["continue", "stop_loop_now", "refine_plan"];
@@ -44,6 +55,7 @@ const ACTION_LABEL: Record<ResumeAction, string> = {
   stop_loop_now: "Stop loop now",
   refine_plan: "Refine plan",
   consult_reflection: "Ask Reflection to decide",
+  retry_iteration: "Retry iteration",
 };
 
 const ACTION_HELP: Record<ResumeAction, string> = {
@@ -57,6 +69,8 @@ const ACTION_HELP: Record<ResumeAction, string> = {
     "Run the architect (synchronously) with your feedback to update the plan, then continue with the next iteration.",
   consult_reflection:
     "Spawn reflection with your feedback as input. Reflection decides what the harness does next (continue / finish / stop / re-pause).",
+  retry_iteration:
+    "Re-spawn dev on the same iteration that just failed (the iteration counter is rolled back, the branch is re-created). Typical use: a quota cap reset, or after fixing whatever caused the agent to crash.",
 };
 
 export function FeedbackForm({
diff --git a/packages/web/src/types.ts b/packages/web/src/types.ts
index 07eeed4..d2d29f3 100644
--- a/packages/web/src/types.ts
+++ b/packages/web/src/types.ts
@@ -131,7 +131,8 @@ export interface LoopState {
     | "anomaly"
     | "user_input_needed"
     | "max_iterations"
-    | "scope_complete";  // item 6.25 follow-up: architect SCOPE_COMPLETE
+    | "scope_complete"  // item 6.25 follow-up: architect SCOPE_COMPLETE
+    | "missing_signals";  // harness-missing-signals: dev/judge exited without writing signals file
   pendingQuestions?: string[];
   userFeedback?: string;
   iterations: LoopIterationRecord[];