Skip to content

vscode/tower: dev server gets stuck when terminated externally — stale bookkeeping blocks restart #796

@amrmelsayed

Description

@amrmelsayed

Problem

When a dev PTY tracked by Tower / the Codev VS Code extension dies outside Tower's control — e.g. the user kills the terminal tab with Ctrl+C, the dev process crashes, the underlying spawned process exits abnormally — Tower's internal bookkeeping isn't cleared. The next Run Dev Server request sees the stale entry and refuses:

Codev: Dev server is already running for <target>

The user can't restart the dev cleanly. Workaround today is to restart Tower (afx tower stop && afx tower start), which is a heavy hammer for what should be self-healing.

Current state

  • commands/dev-shared.ts:101 reads terminalManager.listDevTerminals() to check liveness.
  • commands/dev-shared.ts:104 shows "Dev server is already running for <target>" whenever the list has an entry for the target — no validation that the underlying process is still alive.
  • terminal-manager.ts:257 (listDevTerminals()) returns entries from the in-memory terminals map; entries are added on spawn (openDevTerminal(), line 195) but only removed via the explicit stop path (commands/dev-shared.ts:158, stopWorktreeDev, etc.).
  • packages/codev/src/agent-farm/commands/dev.ts:82 has the analogous server-side log: 'Dev server already running for ${builder.id}.'.
  • There's no exit/close-event subscription tying the bookkeeping to actual PTY lifecycle.

Proposed behavior

1. Eager: subscribe to PTY exit on spawn

When a dev PTY is spawned (terminal-manager.ts:openDevTerminal() and the Tower-side equivalent), wire an exit/close listener that immediately removes the entry from terminals (and the Tower-side tracking) regardless of cause: clean stop, crash, external kill, parent exit.

Status-bar item (#788), Stop command, and Run-Dev checks all consult the same data structure, so a clean removal makes all three consistent without any explicit reconcile step.

2. Lazy: liveness safety net in listDevTerminals() / "already running" check

Before returning entries from listDevTerminals() (or before the "already running" branch in dev-shared.ts:101–104), validate each entry's underlying PTY is alive. Implementation options: PID liveness (process.kill(pid, 0) throws on dead processes), or query the spawned PTY object for its current state. Drop dead entries.

This covers cases where the eager subscription was missed — most notably Tower restart while a dev was running, where the in-memory listeners are gone but the bookkeeping may persist via reload.

3. UX on recovery: inform after start

When the lazy check detects and clears a stale entry, the new dev launch proceeds normally (no modal, no friction). After the new PTY is up, show a one-line status-bar transient (setStatusBarMessage, 5s timeout):

Codev: Previous dev was terminated externally; restarted cleanly.

This educates the user without blocking.

Acceptance criteria

  • Killing a dev PTY externally (close terminal tab, kill <pid>, dev process crash) clears the bookkeeping within ~1s — verified by checking listDevTerminals() returns an empty array.
  • After external termination, invoking Run Dev Server for the same target succeeds and starts a fresh dev without manual intervention.
  • The "Codev: Dev server is already running" toast no longer fires after an external kill.
  • On recovery (lazy check fired, not the eager path), a transient status-bar message confirms the self-heal.
  • Tower's server-side dev tracking (commands/dev.ts:82) is reconciled equivalently — both client and server agree on liveness.
  • No regression to the "already running" check when a dev genuinely IS running.

Out of scope

  • Auto-restarting the dev when it crashes (user is in control of when to restart — this issue only unblocks them from restarting).
  • A modal confirmation before recovery (intentionally rejected — silent recovery with after-the-fact note).
  • Re-architecting how PTYs are spawned or supervised.

Related

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions