fix: add graceful shutdown handler for collab server (SIGTERM/SIGINT)#161
fix: add graceful shutdown handler for collab server (SIGTERM/SIGINT)#161algojogacor wants to merge 1 commit into
Conversation
Handle SIGTERM/SIGINT to prevent data loss on restart: - Stop accepting new HTTP connections - Close all WebSocket connections with going-away frame (1001) - Flush all active Yjs document updates to MongoDB - Track active documents via Map for complete flush coverage Closes piyushdotcomm#140
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
👋 Thanks for opening a PR, @algojogacor!Your PR has entered the 🚦 PR Review Pipeline.
What happens next
A pipeline status comment will appear below and update automatically as your PR progresses. While you wait
This comment is posted only once. |
WalkthroughThe collab server now tracks active Yjs documents and implements graceful shutdown on process signals. When receiving ChangesGraceful Shutdown Flow
Sequence DiagramsequenceDiagram
participant System
participant CollabServer
participant MongoDB
System->>CollabServer: SIGTERM/SIGINT signal
CollabServer->>CollabServer: guard re-entrancy
CollabServer->>CollabServer: close HTTP server
CollabServer->>CollabServer: broadcast WebSocket close frames
CollabServer->>MongoDB: flushDocument (all active docs)
MongoDB-->>CollabServer: flush complete
CollabServer->>CollabServer: exit process
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
server/collab.ts (2)
35-46:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRemove inactive docs from
activeDocsduringwriteState.
activeDocsis only appended to, so stale doc names accumulate and are treated as active at shutdown.Suggested patch
writeState: async (docName: string, ydoc: Y.Doc) => { await _mdb.flushDocument(docName); + activeDocs.delete(docName); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/collab.ts` around lines 35 - 46, The writeState implementation currently only flushes the document and never removes it from activeDocs, causing stale entries to accumulate; update the writeState(docName: string, ydoc: Y.Doc) function to remove the doc from activeDocs after a successful flush (call _mdb.flushDocument(docName) and then activeDocs.delete(docName)), ensuring you only delete the entry once the flush resolves or on the appropriate error path so activeDocs remains consistent at shutdown.
40-42:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winAwait pending
storeUpdatewrites before flushing on shutdown.
_mdb.storeUpdate(...)is async and returns a Promise, but neither line 38 nor lines 40-42 await it. The update event handler is markedasyncbut doesn't await the call, making the Promise fire-and-forget. During shutdown (lines 129-141),flushDocumentis called immediately without waiting for any pendingstoreUpdatecalls to settle, allowing in-flight writes to be missed.The suggested patch addresses this by tracking pending promises in a
Map<string, Set<Promise<unknown>>>, collecting eachstoreUpdatePromise, and awaiting their completion before callingflushDocument.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/collab.ts` around lines 40 - 42, The update handler currently fires-and-forgets _mdb.storeUpdate(docName, update) so in-flight writes can be lost on shutdown; modify the ydoc.on('update', async (update) => { ... }) handler to collect the Promise returned by _mdb.storeUpdate into a Map keyed by docName (e.g., pendingWrites: Map<string, Set<Promise<unknown>>>), add the Promise to the set when started and remove it when settled (finally), and then update the shutdown/flush path (the flushDocument call sequence) to await Promise.all on any pendingWrites.get(docName) (or similar) before calling flushDocument(docName) so all storeUpdate operations complete first.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@server/collab.ts`:
- Around line 106-145: Add a forced-shutdown timeout to gracefulShutdown to
avoid hangs: when gracefulShutdown(signal) starts (after setting isShuttingDown
and before initiating server.close/wss close/flush), start a fallback setTimeout
(e.g., 10s) that forcefully calls process.exit(1) and log a warning; store the
timer id in a local variable and clear that timeout once all shutdown work
completes (after Promise.allSettled on flushPromises and successful server/ws
closes) so normal shutdown exits with process.exit(0). Reference
gracefulShutdown, server.close, wss.clients, _mdb.flushDocument, flushPromises
and isShuttingDown when adding/clearing the timeout.
---
Outside diff comments:
In `@server/collab.ts`:
- Around line 35-46: The writeState implementation currently only flushes the
document and never removes it from activeDocs, causing stale entries to
accumulate; update the writeState(docName: string, ydoc: Y.Doc) function to
remove the doc from activeDocs after a successful flush (call
_mdb.flushDocument(docName) and then activeDocs.delete(docName)), ensuring you
only delete the entry once the flush resolves or on the appropriate error path
so activeDocs remains consistent at shutdown.
- Around line 40-42: The update handler currently fires-and-forgets
_mdb.storeUpdate(docName, update) so in-flight writes can be lost on shutdown;
modify the ydoc.on('update', async (update) => { ... }) handler to collect the
Promise returned by _mdb.storeUpdate into a Map keyed by docName (e.g.,
pendingWrites: Map<string, Set<Promise<unknown>>>), add the Promise to the set
when started and remove it when settled (finally), and then update the
shutdown/flush path (the flushDocument call sequence) to await Promise.all on
any pendingWrites.get(docName) (or similar) before calling
flushDocument(docName) so all storeUpdate operations complete first.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: eb496d7b-40eb-4ebc-9347-d3cb094baffa
📒 Files selected for processing (1)
server/collab.ts
| async function gracefulShutdown(signal: string) { | ||
| if (isShuttingDown) { | ||
| console.log(`[collab] Already shutting down, ignoring ${signal}`); | ||
| return; | ||
| } | ||
| isShuttingDown = true; | ||
|
|
||
| console.log(`[collab] Received ${signal}, shutting down gracefully...`); | ||
|
|
||
| // 1. Stop accepting new connections | ||
| server.close((err) => { | ||
| if (err) console.error('[collab] Error closing HTTP server:', err); | ||
| }); | ||
|
|
||
| // 2. Close all WebSocket connections with a close frame | ||
| wss.clients.forEach((client) => { | ||
| try { | ||
| client.close(1001, 'Server shutting down'); | ||
| } catch { | ||
| // ignore individual close errors | ||
| } | ||
| }); | ||
|
|
||
| // 3. Flush all active Yjs documents to MongoDB | ||
| const docNames = Array.from(activeDocs.keys()); | ||
| console.log(`[collab] Flushing ${docNames.length} active document(s)...`); | ||
|
|
||
| const flushPromises = docNames.map(async (docName) => { | ||
| try { | ||
| await _mdb.flushDocument(docName); | ||
| console.log(`[collab] Flushed: ${docName}`); | ||
| } catch (err) { | ||
| console.error(`[collab] Failed to flush ${docName}:`, err); | ||
| } | ||
| }); | ||
| await Promise.allSettled(flushPromises); | ||
|
|
||
| console.log('[collab] Graceful shutdown complete.'); | ||
| process.exit(0); | ||
| } |
There was a problem hiding this comment.
Add a forced-shutdown timeout to prevent hanging termination.
If server.close/Mongo flush stalls, this path can hang indefinitely. The linked shutdown objective explicitly calls for a timeout fallback.
Suggested patch
+const SHUTDOWN_TIMEOUT_MS = Number(process.env.COLLAB_SHUTDOWN_TIMEOUT_MS ?? 10000);
async function gracefulShutdown(signal: string) {
if (isShuttingDown) {
console.log(`[collab] Already shutting down, ignoring ${signal}`);
return;
}
isShuttingDown = true;
+ const forceExitTimer = setTimeout(() => {
+ console.error(`[collab] Shutdown timed out after ${SHUTDOWN_TIMEOUT_MS}ms. Forcing exit.`);
+ process.exit(1);
+ }, SHUTDOWN_TIMEOUT_MS);
+ forceExitTimer.unref();
console.log(`[collab] Received ${signal}, shutting down gracefully...`);
// existing shutdown steps...
console.log('[collab] Graceful shutdown complete.');
+ clearTimeout(forceExitTimer);
process.exit(0);
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@server/collab.ts` around lines 106 - 145, Add a forced-shutdown timeout to
gracefulShutdown to avoid hangs: when gracefulShutdown(signal) starts (after
setting isShuttingDown and before initiating server.close/wss close/flush),
start a fallback setTimeout (e.g., 10s) that forcefully calls process.exit(1)
and log a warning; store the timer id in a local variable and clear that timeout
once all shutdown work completes (after Promise.allSettled on flushPromises and
successful server/ws closes) so normal shutdown exits with process.exit(0).
Reference gracefulShutdown, server.close, wss.clients, _mdb.flushDocument,
flushPromises and isShuttingDown when adding/clearing the timeout.
piyushdotcomm
left a comment
There was a problem hiding this comment.
check the coderabbit review before pinging maintainers
Summary
Fixes #140 — The collab server had no graceful shutdown handler, causing active Yjs edits to be lost when the process received SIGTERM (container restart) or SIGINT (Ctrl+C in dev).
Root Cause
The collab server (
server/collab.ts) did not register handlers forSIGTERMorSIGINT. When the process was stopped, WebSocket connections dropped abruptly and pending Yjs document updates were never flushed to MongoDB.Solution
Added a
gracefulShutdown()function that runs on SIGTERM/SIGINT:Map<string, Y.Doc>and calls_mdb.flushDocument()on each before exiting.isShuttingDownflag prevents duplicate shutdown from simultaneous signals.Changes
server/collab.ts(+35 −0)activeDocsMap to track open documents for flush on shutdowngracefulShutdown()async functionSIGTERMandSIGINThandlersisShuttingDownguard against double-invocationbindStatecallback to register documents inactiveDocsTesting
SIGTERM— HTTP server stops accepting connections, clients notified, documents flushedSIGINT(Ctrl+C in dev) — same graceful path as SIGTERMSummary by CodeRabbit
Bug Fixes
Chores