feat: broadcast shutdown notification to active threads (RFC #78 §1d Phase 1) by ruan330 · Pull Request #182 · openabdev/openab

ruan330 · 2026-04-10T11:25:33Z

Summary

Implements the Phase 1 portion of RFC #78 §1d (graceful shutdown). When the broker receives SIGINT or SIGTERM, it now iterates active sessions and posts a short notification to each Discord thread before clearing the pool.

Relates to: #78, #75

Problem

Today, when the broker is restarted (systemctl restart, docker stop, Ctrl+C), conversations just die silently mid-reply. Users have no signal that anything is happening until the broker comes back, and in-flight streams leave partial messages with no explanation. The RFC already calls this out as §1d, so this PR focuses purely on the Phase 1 portion — the notification itself, without any state persistence.

Changes

src/acp/pool.rs — new helper

pub async fn active_thread_ids(&self) -> Vec<String> {
    self.connections.read().await.keys().cloned().collect()
}

A read-only snapshot of the currently tracked thread IDs. Kept tiny on purpose so that the shutdown path only needs a read lock while it walks the map, and the subsequent shutdown() call still has exclusive write access.

src/main.rs — shutdown task

Two changes in one place:

Listen for SIGTERM in addition to SIGINT. Previously only tokio::signal::ctrl_c() was awaited, so systemctl stop / docker stop / kill bypassed the graceful path entirely. Now uses tokio::signal::unix::signal + tokio::select! to handle both.
Broadcast the notification before shard_manager.shutdown_all(). Iterates active_thread_ids(), parses each to a ChannelId, and posts a fixed message. Post failures are logged as warnings but do not block pool cleanup — a dead Discord connection shouldn't leave agent processes hanging.

Message text:

🔄 Broker restarting. You can continue the conversation when the broker is back.

Design decisions

A few small calls I'd like reviewers to weigh in on:

Wording is intentionally neutral. The RFC draft uses "🔄 Broker restarting...", which I kept. I chose "you can continue the conversation when the broker is back" over "your session will resume shortly" because the latter implies state restoration, and in-memory session state (per-thread cwd, conversation context) is still lost on restart until Phase 2 of §1d lands. Happy to switch if you'd prefer different wording.

No grace period for in-flight streams. Kept the PR minimal. A configurable drain window (wait N seconds for running prompts to finish, then broadcast, then shutdown) would be a reasonable follow-up, but it adds a config knob and some coordination that I'd rather keep separate from the notification itself.

Rate limiting. For pools with many active sessions, rapid channel.say() calls could hit Discord thread rate limits. I deliberately did not add throttling — in practice max_sessions defaults to 10, and on a real incident the notification arriving fast is more valuable than being pretty. If this becomes a problem at higher pool sizes we can revisit.

Why active_thread_ids() instead of iterating inline. The pool's connections field is private, and I wanted to avoid exposing it. A small accessor that returns an owned Vec<String> is the narrowest surface I could think of; it also keeps the read lock scoped to one line so it can't accidentally deadlock with shutdown().

Alternatives considered

Put the broadcast inside SessionPool::shutdown() itself. Considered this first, but then pool.rs would need to know about serenity::http::Http and message formatting, which felt like the wrong direction — the pool has no other Discord dependency today.
Pass a callback closure into shutdown(). Technically cleaner but adds generics/boxing noise for a path that runs once per process. The split between main.rs (Discord) and pool.rs (lifecycle) felt clearer with a plain accessor.
Broadcast on a background task so pool shutdown isn't blocked. The whole shutdown sequence is already in a spawned task awaiting the signal, so there's no main-thread impact to worry about. Keeping it sequential means the notification is actually sent before the process exits.

Out of scope — Phase 2 follow-up

This PR is deliberately only Phase 1 of §1d. Phase 2 (persisting session metadata to disk so cwds actually survive the restart) is a separate concern with its own design tradeoffs (file format, location, atomicity, schema evolution) and I think it deserves its own PR. Happy to open that next if this approach lands.

Test plan

Tested locally against a running broker with one active session:

[11:20:18] INFO openab: shutdown signal received
[11:20:18] INFO openab: broadcasting shutdown notification count=1
[11:20:19] INFO openab::acp::pool: pool shutdown complete count=1

The notification arrived in the Discord thread immediately before the process exited. cargo build --release from a clean checkout of this branch succeeds with no new warnings.

cargo build --release on clean upstream/main baseline
End-to-end test: SIGTERM on running broker → notification posted → pool shutdown → process exits
Reviewer sanity check on wording / scope

Implements the Phase 1 portion of RFC openabdev#78 §1d (graceful shutdown). On SIGINT or SIGTERM, the broker now iterates active sessions and posts a neutral notification to each Discord thread before clearing the pool. Users get a clear signal that the broker is restarting instead of conversations dying silently. Changes: - Add `SessionPool::active_thread_ids()` returning a snapshot of tracked thread IDs (read-only lookup, no write lock held). - Main shutdown task now listens for SIGTERM in addition to SIGINT via `tokio::signal::unix`. Previously only Ctrl+C triggered the graceful path; `systemctl stop` / `docker stop` / `kill` all bypassed it entirely, which made the broadcast valuable for production but invisible in dev. - Before `shard_manager.shutdown_all()`, iterate the snapshot and post "🔄 Broker restarting. You can continue the conversation when the broker is back." to each thread. Failures are warned, not fatal — a dead Discord connection shouldn't block pool cleanup. Design notes: - Wording is intentionally neutral. "Will resume" would imply state restoration, which is Phase 2 of §1d (persistence) and out of scope here. - No grace period for in-flight streams. Keeping the PR minimal; a configurable drain window can be a follow-up if needed. - `active_thread_ids()` takes a snapshot via `.read()` so the shutdown loop does not contend with the `shutdown()` write lock on the same map. Tested locally against `config.toml` with 1 active session: SIGTERM produces the expected log sequence (`shutdown signal received` → `broadcasting shutdown notification count=1` → `pool shutdown complete count=1`) and the notification arrives in the thread before the process exits.

ruan330 requested a review from thepagent as a code owner April 10, 2026 11:25

thepagent added the p2 Medium — planned work label Apr 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: broadcast shutdown notification to active threads (RFC #78 §1d Phase 1)#182

feat: broadcast shutdown notification to active threads (RFC #78 §1d Phase 1)#182
ruan330 wants to merge 1 commit intoopenabdev:mainfrom
ruan330:feat/shutdown-broadcast

ruan330 commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ruan330 commented Apr 10, 2026

Summary

Problem

Changes

Design decisions

Alternatives considered

Out of scope — Phase 2 follow-up

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants