feat: broadcast shutdown notification to active threads (RFC #78 §1d Phase 1)#182
Open
ruan330 wants to merge 1 commit intoopenabdev:mainfrom
Open
feat: broadcast shutdown notification to active threads (RFC #78 §1d Phase 1)#182ruan330 wants to merge 1 commit intoopenabdev:mainfrom
ruan330 wants to merge 1 commit intoopenabdev:mainfrom
Conversation
Implements the Phase 1 portion of RFC openabdev#78 §1d (graceful shutdown). On SIGINT or SIGTERM, the broker now iterates active sessions and posts a neutral notification to each Discord thread before clearing the pool. Users get a clear signal that the broker is restarting instead of conversations dying silently. Changes: - Add `SessionPool::active_thread_ids()` returning a snapshot of tracked thread IDs (read-only lookup, no write lock held). - Main shutdown task now listens for SIGTERM in addition to SIGINT via `tokio::signal::unix`. Previously only Ctrl+C triggered the graceful path; `systemctl stop` / `docker stop` / `kill` all bypassed it entirely, which made the broadcast valuable for production but invisible in dev. - Before `shard_manager.shutdown_all()`, iterate the snapshot and post "🔄 Broker restarting. You can continue the conversation when the broker is back." to each thread. Failures are warned, not fatal — a dead Discord connection shouldn't block pool cleanup. Design notes: - Wording is intentionally neutral. "Will resume" would imply state restoration, which is Phase 2 of §1d (persistence) and out of scope here. - No grace period for in-flight streams. Keeping the PR minimal; a configurable drain window can be a follow-up if needed. - `active_thread_ids()` takes a snapshot via `.read()` so the shutdown loop does not contend with the `shutdown()` write lock on the same map. Tested locally against `config.toml` with 1 active session: SIGTERM produces the expected log sequence (`shutdown signal received` → `broadcasting shutdown notification count=1` → `pool shutdown complete count=1`) and the notification arrives in the thread before the process exits.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements the Phase 1 portion of RFC #78 §1d (graceful shutdown). When the broker receives SIGINT or SIGTERM, it now iterates active sessions and posts a short notification to each Discord thread before clearing the pool.
Relates to: #78, #75
Problem
Today, when the broker is restarted (
systemctl restart,docker stop,Ctrl+C), conversations just die silently mid-reply. Users have no signal that anything is happening until the broker comes back, and in-flight streams leave partial messages with no explanation. The RFC already calls this out as §1d, so this PR focuses purely on the Phase 1 portion — the notification itself, without any state persistence.Changes
src/acp/pool.rs— new helperA read-only snapshot of the currently tracked thread IDs. Kept tiny on purpose so that the shutdown path only needs a read lock while it walks the map, and the subsequent
shutdown()call still has exclusive write access.src/main.rs— shutdown taskTwo changes in one place:
tokio::signal::ctrl_c()was awaited, sosystemctl stop/docker stop/killbypassed the graceful path entirely. Now usestokio::signal::unix::signal+tokio::select!to handle both.shard_manager.shutdown_all(). Iteratesactive_thread_ids(), parses each to aChannelId, and posts a fixed message. Post failures are logged as warnings but do not block pool cleanup — a dead Discord connection shouldn't leave agent processes hanging.Message text:
Design decisions
A few small calls I'd like reviewers to weigh in on:
Wording is intentionally neutral. The RFC draft uses "🔄 Broker restarting...", which I kept. I chose "you can continue the conversation when the broker is back" over "your session will resume shortly" because the latter implies state restoration, and in-memory session state (per-thread
cwd, conversation context) is still lost on restart until Phase 2 of §1d lands. Happy to switch if you'd prefer different wording.No grace period for in-flight streams. Kept the PR minimal. A configurable drain window (wait N seconds for running prompts to finish, then broadcast, then shutdown) would be a reasonable follow-up, but it adds a config knob and some coordination that I'd rather keep separate from the notification itself.
Rate limiting. For pools with many active sessions, rapid
channel.say()calls could hit Discord thread rate limits. I deliberately did not add throttling — in practicemax_sessionsdefaults to 10, and on a real incident the notification arriving fast is more valuable than being pretty. If this becomes a problem at higher pool sizes we can revisit.Why
active_thread_ids()instead of iterating inline. The pool'sconnectionsfield is private, and I wanted to avoid exposing it. A small accessor that returns an ownedVec<String>is the narrowest surface I could think of; it also keeps the read lock scoped to one line so it can't accidentally deadlock withshutdown().Alternatives considered
SessionPool::shutdown()itself. Considered this first, but thenpool.rswould need to know aboutserenity::http::Httpand message formatting, which felt like the wrong direction — the pool has no other Discord dependency today.shutdown(). Technically cleaner but adds generics/boxing noise for a path that runs once per process. The split betweenmain.rs(Discord) andpool.rs(lifecycle) felt clearer with a plain accessor.Out of scope — Phase 2 follow-up
This PR is deliberately only Phase 1 of §1d. Phase 2 (persisting session metadata to disk so cwds actually survive the restart) is a separate concern with its own design tradeoffs (file format, location, atomicity, schema evolution) and I think it deserves its own PR. Happy to open that next if this approach lands.
Test plan
Tested locally against a running broker with one active session:
The notification arrived in the Discord thread immediately before the process exited.
cargo build --releasefrom a clean checkout of this branch succeeds with no new warnings.cargo build --releaseon clean upstream/main baseline