Problem
When a dev PTY tracked by Tower / the Codev VS Code extension dies outside Tower's control — e.g. the user kills the terminal tab with Ctrl+C, the dev process crashes, the underlying spawned process exits abnormally — Tower's internal bookkeeping isn't cleared. The next Run Dev Server request sees the stale entry and refuses:
Codev: Dev server is already running for <target>
The user can't restart the dev cleanly. Workaround today is to restart Tower (afx tower stop && afx tower start), which is a heavy hammer for what should be self-healing.
Current state
commands/dev-shared.ts:101 reads terminalManager.listDevTerminals() to check liveness.
commands/dev-shared.ts:104 shows "Dev server is already running for <target>" whenever the list has an entry for the target — no validation that the underlying process is still alive.
terminal-manager.ts:257 (listDevTerminals()) returns entries from the in-memory terminals map; entries are added on spawn (openDevTerminal(), line 195) but only removed via the explicit stop path (commands/dev-shared.ts:158, stopWorktreeDev, etc.).
packages/codev/src/agent-farm/commands/dev.ts:82 has the analogous server-side log: 'Dev server already running for ${builder.id}.'.
- There's no exit/close-event subscription tying the bookkeeping to actual PTY lifecycle.
Proposed behavior
1. Eager: subscribe to PTY exit on spawn
When a dev PTY is spawned (terminal-manager.ts:openDevTerminal() and the Tower-side equivalent), wire an exit/close listener that immediately removes the entry from terminals (and the Tower-side tracking) regardless of cause: clean stop, crash, external kill, parent exit.
Status-bar item (#788), Stop command, and Run-Dev checks all consult the same data structure, so a clean removal makes all three consistent without any explicit reconcile step.
2. Lazy: liveness safety net in listDevTerminals() / "already running" check
Before returning entries from listDevTerminals() (or before the "already running" branch in dev-shared.ts:101–104), validate each entry's underlying PTY is alive. Implementation options: PID liveness (process.kill(pid, 0) throws on dead processes), or query the spawned PTY object for its current state. Drop dead entries.
This covers cases where the eager subscription was missed — most notably Tower restart while a dev was running, where the in-memory listeners are gone but the bookkeeping may persist via reload.
3. UX on recovery: inform after start
When the lazy check detects and clears a stale entry, the new dev launch proceeds normally (no modal, no friction). After the new PTY is up, show a one-line status-bar transient (setStatusBarMessage, 5s timeout):
Codev: Previous dev was terminated externally; restarted cleanly.
This educates the user without blocking.
Acceptance criteria
Out of scope
- Auto-restarting the dev when it crashes (user is in control of when to restart — this issue only unblocks them from restarting).
- A modal confirmation before recovery (intentionally rejected — silent recovery with after-the-fact note).
- Re-architecting how PTYs are spawned or supervised.
Related
Problem
When a dev PTY tracked by Tower / the Codev VS Code extension dies outside Tower's control — e.g. the user kills the terminal tab with Ctrl+C, the dev process crashes, the underlying spawned process exits abnormally — Tower's internal bookkeeping isn't cleared. The next
Run Dev Serverrequest sees the stale entry and refuses:The user can't restart the dev cleanly. Workaround today is to restart Tower (
afx tower stop && afx tower start), which is a heavy hammer for what should be self-healing.Current state
commands/dev-shared.ts:101readsterminalManager.listDevTerminals()to check liveness.commands/dev-shared.ts:104shows "Dev server is already running for <target>" whenever the list has an entry for the target — no validation that the underlying process is still alive.terminal-manager.ts:257(listDevTerminals()) returns entries from the in-memoryterminalsmap; entries are added on spawn (openDevTerminal(), line 195) but only removed via the explicit stop path (commands/dev-shared.ts:158,stopWorktreeDev, etc.).packages/codev/src/agent-farm/commands/dev.ts:82has the analogous server-side log:'Dev server already running for ${builder.id}.'.Proposed behavior
1. Eager: subscribe to PTY exit on spawn
When a dev PTY is spawned (
terminal-manager.ts:openDevTerminal()and the Tower-side equivalent), wire an exit/close listener that immediately removes the entry fromterminals(and the Tower-side tracking) regardless of cause: clean stop, crash, external kill, parent exit.Status-bar item (#788), Stop command, and Run-Dev checks all consult the same data structure, so a clean removal makes all three consistent without any explicit reconcile step.
2. Lazy: liveness safety net in
listDevTerminals()/ "already running" checkBefore returning entries from
listDevTerminals()(or before the "already running" branch indev-shared.ts:101–104), validate each entry's underlying PTY is alive. Implementation options: PID liveness (process.kill(pid, 0)throws on dead processes), or query the spawned PTY object for its current state. Drop dead entries.This covers cases where the eager subscription was missed — most notably Tower restart while a dev was running, where the in-memory listeners are gone but the bookkeeping may persist via reload.
3. UX on recovery: inform after start
When the lazy check detects and clears a stale entry, the new dev launch proceeds normally (no modal, no friction). After the new PTY is up, show a one-line status-bar transient (
setStatusBarMessage, 5s timeout):This educates the user without blocking.
Acceptance criteria
kill <pid>, dev process crash) clears the bookkeeping within ~1s — verified by checkinglistDevTerminals()returns an empty array.Run Dev Serverfor the same target succeeds and starts a fresh dev without manual intervention.commands/dev.ts:82) is reconciled equivalently — both client and server agree on liveness.Out of scope
Related