Skip to content

feat: broadcast shutdown notification to active threads (RFC #78 §1d Phase 1)#182

Open
ruan330 wants to merge 1 commit intoopenabdev:mainfrom
ruan330:feat/shutdown-broadcast
Open

feat: broadcast shutdown notification to active threads (RFC #78 §1d Phase 1)#182
ruan330 wants to merge 1 commit intoopenabdev:mainfrom
ruan330:feat/shutdown-broadcast

Conversation

@ruan330
Copy link
Copy Markdown

@ruan330 ruan330 commented Apr 10, 2026

Summary

Implements the Phase 1 portion of RFC #78 §1d (graceful shutdown). When the broker receives SIGINT or SIGTERM, it now iterates active sessions and posts a short notification to each Discord thread before clearing the pool.

Relates to: #78, #75

Problem

Today, when the broker is restarted (systemctl restart, docker stop, Ctrl+C), conversations just die silently mid-reply. Users have no signal that anything is happening until the broker comes back, and in-flight streams leave partial messages with no explanation. The RFC already calls this out as §1d, so this PR focuses purely on the Phase 1 portion — the notification itself, without any state persistence.

Changes

src/acp/pool.rs — new helper

pub async fn active_thread_ids(&self) -> Vec<String> {
    self.connections.read().await.keys().cloned().collect()
}

A read-only snapshot of the currently tracked thread IDs. Kept tiny on purpose so that the shutdown path only needs a read lock while it walks the map, and the subsequent shutdown() call still has exclusive write access.

src/main.rs — shutdown task

Two changes in one place:

  1. Listen for SIGTERM in addition to SIGINT. Previously only tokio::signal::ctrl_c() was awaited, so systemctl stop / docker stop / kill bypassed the graceful path entirely. Now uses tokio::signal::unix::signal + tokio::select! to handle both.
  2. Broadcast the notification before shard_manager.shutdown_all(). Iterates active_thread_ids(), parses each to a ChannelId, and posts a fixed message. Post failures are logged as warnings but do not block pool cleanup — a dead Discord connection shouldn't leave agent processes hanging.

Message text:

🔄 Broker restarting. You can continue the conversation when the broker is back.

Design decisions

A few small calls I'd like reviewers to weigh in on:

Wording is intentionally neutral. The RFC draft uses "🔄 Broker restarting...", which I kept. I chose "you can continue the conversation when the broker is back" over "your session will resume shortly" because the latter implies state restoration, and in-memory session state (per-thread cwd, conversation context) is still lost on restart until Phase 2 of §1d lands. Happy to switch if you'd prefer different wording.

No grace period for in-flight streams. Kept the PR minimal. A configurable drain window (wait N seconds for running prompts to finish, then broadcast, then shutdown) would be a reasonable follow-up, but it adds a config knob and some coordination that I'd rather keep separate from the notification itself.

Rate limiting. For pools with many active sessions, rapid channel.say() calls could hit Discord thread rate limits. I deliberately did not add throttling — in practice max_sessions defaults to 10, and on a real incident the notification arriving fast is more valuable than being pretty. If this becomes a problem at higher pool sizes we can revisit.

Why active_thread_ids() instead of iterating inline. The pool's connections field is private, and I wanted to avoid exposing it. A small accessor that returns an owned Vec<String> is the narrowest surface I could think of; it also keeps the read lock scoped to one line so it can't accidentally deadlock with shutdown().

Alternatives considered

  • Put the broadcast inside SessionPool::shutdown() itself. Considered this first, but then pool.rs would need to know about serenity::http::Http and message formatting, which felt like the wrong direction — the pool has no other Discord dependency today.
  • Pass a callback closure into shutdown(). Technically cleaner but adds generics/boxing noise for a path that runs once per process. The split between main.rs (Discord) and pool.rs (lifecycle) felt clearer with a plain accessor.
  • Broadcast on a background task so pool shutdown isn't blocked. The whole shutdown sequence is already in a spawned task awaiting the signal, so there's no main-thread impact to worry about. Keeping it sequential means the notification is actually sent before the process exits.

Out of scope — Phase 2 follow-up

This PR is deliberately only Phase 1 of §1d. Phase 2 (persisting session metadata to disk so cwds actually survive the restart) is a separate concern with its own design tradeoffs (file format, location, atomicity, schema evolution) and I think it deserves its own PR. Happy to open that next if this approach lands.

Test plan

Tested locally against a running broker with one active session:

[11:20:18] INFO openab: shutdown signal received
[11:20:18] INFO openab: broadcasting shutdown notification count=1
[11:20:19] INFO openab::acp::pool: pool shutdown complete count=1

The notification arrived in the Discord thread immediately before the process exited. cargo build --release from a clean checkout of this branch succeeds with no new warnings.

  • cargo build --release on clean upstream/main baseline
  • End-to-end test: SIGTERM on running broker → notification posted → pool shutdown → process exits
  • Reviewer sanity check on wording / scope

Implements the Phase 1 portion of RFC openabdev#78 §1d (graceful shutdown).

On SIGINT or SIGTERM, the broker now iterates active sessions and
posts a neutral notification to each Discord thread before clearing
the pool. Users get a clear signal that the broker is restarting
instead of conversations dying silently.

Changes:

- Add `SessionPool::active_thread_ids()` returning a snapshot of
  tracked thread IDs (read-only lookup, no write lock held).
- Main shutdown task now listens for SIGTERM in addition to SIGINT
  via `tokio::signal::unix`. Previously only Ctrl+C triggered the
  graceful path; `systemctl stop` / `docker stop` / `kill` all bypassed
  it entirely, which made the broadcast valuable for production but
  invisible in dev.
- Before `shard_manager.shutdown_all()`, iterate the snapshot and
  post "🔄 Broker restarting. You can continue the conversation when
  the broker is back." to each thread. Failures are warned, not
  fatal — a dead Discord connection shouldn't block pool cleanup.

Design notes:

- Wording is intentionally neutral. "Will resume" would imply
  state restoration, which is Phase 2 of §1d (persistence) and out
  of scope here.
- No grace period for in-flight streams. Keeping the PR minimal;
  a configurable drain window can be a follow-up if needed.
- `active_thread_ids()` takes a snapshot via `.read()` so the
  shutdown loop does not contend with the `shutdown()` write lock
  on the same map.

Tested locally against `config.toml` with 1 active session: SIGTERM
produces the expected log sequence (`shutdown signal received` →
`broadcasting shutdown notification count=1` → `pool shutdown
complete count=1`) and the notification arrives in the thread before
the process exits.
@ruan330 ruan330 requested a review from thepagent as a code owner April 10, 2026 11:25
@thepagent thepagent added the p2 Medium — planned work label Apr 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

p2 Medium — planned work

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants