bug: with_connection global write lock + unclosed notify channel freezes bot when ACP process goes stale

### Description

When an ACP child process becomes unresponsive (stale session, auth expired, CLI hung), the bot freezes completely — it continues posting `...` placeholders for every incoming mention but never processes any prompt. This affects **all threads**, not just the one with the stale session. Confirmed on v0.7.2 (latest stable) — the affected code paths are unchanged.

**Two bugs combine:**

**Bug 1: `with_connection` holds global write lock during streaming (`pool.rs:72-80`)**

```rust
pub async fn with_connection<F, R>(&self, thread_id: &str, f: F) -> Result<R> {
    let mut conns = self.connections.write().await;  // global write lock
    let conn = conns.get_mut(thread_id)...;
    f(conn).await  // held for ENTIRE duration of stream_prompt (minutes/hours)
}
```

`stream_prompt` runs inside `f`, so the `RwLock` write guard on `self.connections` is held for the entire streaming duration. While held, **all** other `get_or_create` (read lock), `with_connection` (write lock), and `cleanup_idle` (write lock) calls block. Tokio's `RwLock` blocks new readers when a writer is waiting, so even the read-lock fast path in `get_or_create` is blocked.

**Bug 2: `rx.recv()` hangs forever when ACP process dies (`connection.rs:152-153`)**

When the reader task hits EOF (child process died), it does not close the notify channel:

```rust
let sub = notify_tx.lock().await;
drop(sub);  // drops MutexGuard, NOT the Option<Sender> inside
```

The `mpsc::UnboundedSender` remains alive in `Arc<Mutex<Option<Sender>>>`, so `rx.recv()` in `stream_prompt` (`discord.rs:269`) **never returns `None`** — it waits forever.

**Combined effect:**
1. ACP process becomes stale (CLI session expired, API auth timeout, OOM, etc.)
2. `stream_prompt` sends prompt via `session_prompt` → `rx.recv()` hangs forever (sender not closed)
3. Write lock on `pool.connections` held forever
4. All subsequent `get_or_create` calls block (read lock blocked by held write lock)
5. `cleanup_idle` also blocks (needs write lock) — can't clean up the stale session
6. Bot posts `...` for every incoming mention (before pool access) but never updates them
7. **Entire bot is frozen** — not just one thread, ALL threads

**Suggested fixes:**
- **Fix 1 (critical):** Close notify channel on EOF: `*sub = None;` instead of `drop(sub);`
- **Fix 2 (critical):** Don't hold global lock during streaming — use per-connection locks or extract connection from pool during streaming
- **Fix 3 (recommended):** Add timeout to `rx.recv()` in streaming loop
- **Fix 4 (defense-in-depth):** Improve `alive()` to also check child process status

Analysis validated independently by Kiro (kiro-cli) and Codex (codex-acp).

### Steps to Reproduce

1. Deploy openab with any ACP agent (claude-agent-acp, gemini --acp, kiro-cli acp, codex-acp)
2. Send a mention that triggers an ACP session — bot processes it normally
3. Wait for the ACP child process to become stale (e.g., CLI session expires overnight, API auth times out, or the process hangs)
4. Send another mention to the same bot
5. Bot posts `...` placeholder but never updates it — `stream_prompt` hangs on `rx.recv()` holding the pool write lock
6. Send mentions targeting different threads — ALL are also frozen because the global write lock is held

### Expected Behavior

- When an ACP process dies or becomes unresponsive, the notify channel should close (`rx.recv()` returns `None`), the error is surfaced to the user (e.g., `⚠️ Failed`), and the stale session is cleaned up
- Other threads should not be affected by one stale session — the pool lock should not be held during streaming

### Environment

- openab version: 0.6.0 on our VPS, but **confirmed unchanged in 0.7.2** (`pool.rs` never modified since initial commit, `connection.rs` EOF cleanup unchanged)
- Helm chart: 0.6.0 (latest stable: 0.7.3-beta.56)
- ACP agents: claude-agent-acp, gemini --acp, kiro-cli acp, codex-acp
- K3s on Zeabur VPS (2 vCPU, 4GB RAM)

### Screenshots / Logs

**Logs** — after freeze, only `accepted bot message` appears. No `spawning agent`, no `pool error`, no errors at all:
```
[dispatcher] Dispatched #192 (code-review) → itachi (prompt delivered)
# itachi log:
INFO openab::discord: accepted bot message (in allowed_bots_from) bot_id=1490975142803669113
# ... nothing else. No spawning, no errors. Repeated every 5 minutes for 10+ hours.
```

**Discord thread** — 25 consecutive `...` messages from bots, none ever updated:
```
[11:24:55] 千手扉間 (Dispatcher): <@itachi> (phase: code-review) — please pick up #192
[11:24:56] 宇智波鼬: ...
[11:29:39] 千手扉間 (Dispatcher): <@itachi> (phase: code-review) — please pick up #192
[11:29:40] 宇智波鼬: ...
# ... repeated 25 times, none updated beyond "..."
```

**Scale**: 4 agents frozen, 432+ mentions over ~10 hours, zero processed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: with_connection global write lock + unclosed notify channel freezes bot when ACP process goes stale #295

Description

Steps to Reproduce

Expected Behavior

Environment

Screenshots / Logs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

bug: with_connection global write lock + unclosed notify channel freezes bot when ACP process goes stale #295

Description

Description

Steps to Reproduce

Expected Behavior

Environment

Screenshots / Logs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions