Skip to content

fix: add graceful shutdown handler for collab server (SIGTERM/SIGINT)#161

Open
algojogacor wants to merge 1 commit into
piyushdotcomm:mainfrom
algojogacor:fix/collab-graceful-shutdown
Open

fix: add graceful shutdown handler for collab server (SIGTERM/SIGINT)#161
algojogacor wants to merge 1 commit into
piyushdotcomm:mainfrom
algojogacor:fix/collab-graceful-shutdown

Conversation

@algojogacor
Copy link
Copy Markdown

@algojogacor algojogacor commented May 16, 2026

Summary

Fixes #140 — The collab server had no graceful shutdown handler, causing active Yjs edits to be lost when the process received SIGTERM (container restart) or SIGINT (Ctrl+C in dev).

Root Cause

The collab server (server/collab.ts) did not register handlers for SIGTERM or SIGINT. When the process was stopped, WebSocket connections dropped abruptly and pending Yjs document updates were never flushed to MongoDB.

Solution

Added a gracefulShutdown() function that runs on SIGTERM/SIGINT:

  1. Stop accepting new connections — closes the HTTP server to prevent new upgrade requests.
  2. Notify connected clients — sends WebSocket close frame with code 1001 (Going Away) to all connected clients.
  3. Flush all active documents — tracks active Yjs docs via a Map<string, Y.Doc> and calls _mdb.flushDocument() on each before exiting.
  4. Idempotent — a isShuttingDown flag prevents duplicate shutdown from simultaneous signals.

Changes

  • server/collab.ts (+35 −0)
    • Added activeDocs Map to track open documents for flush on shutdown
    • Added gracefulShutdown() async function
    • Registered SIGTERM and SIGINT handlers
    • Added isShuttingDown guard against double-invocation
    • Updated bindState callback to register documents in activeDocs

Testing

  • SIGTERM — HTTP server stops accepting connections, clients notified, documents flushed
  • SIGINT (Ctrl+C in dev) — same graceful path as SIGTERM
  • ✅ Double signal — second signal logged and ignored, no panic
  • ✅ No active documents — shutdown completes cleanly with empty flush

Summary by CodeRabbit

  • Bug Fixes

    • Improved error handling during HTTP connection upgrades with proper error responses and socket resource cleanup to prevent hangs.
  • Chores

    • Enhanced graceful shutdown process for the collaboration server, which now stops accepting new connections, closes all active WebSocket clients with proper close frames, flushes all collaboration documents to persistent storage, and then exits cleanly.

Review Change Stack

Handle SIGTERM/SIGINT to prevent data loss on restart:
- Stop accepting new HTTP connections
- Close all WebSocket connections with going-away frame (1001)
- Flush all active Yjs document updates to MongoDB
- Track active documents via Map for complete flush coverage

Closes piyushdotcomm#140
@qodo-code-review
Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

@github-actions
Copy link
Copy Markdown

👋 Thanks for opening a PR, @algojogacor!

Your PR has entered the 🚦 PR Review Pipeline.

Standard PR detected — your PR will follow the standard review pipeline.


What happens next

Stage Reviewer Checks
Stage 1 — Automated Validation 🤖 Bot DCO · Format · AI/Slop · Duplicate
Stage 2 — Human Review 👥 Maintainer Code + Quality Review
Stage 3 — PA / Maintainer Review 🔑 Project Admin Final Merge Decision

A pipeline status comment will appear below and update automatically as your PR progresses.


While you wait

  • Sign all commits (git commit -s)
  • Link your issue (Closes #123)
  • Use a feature branch (not main)
  • Avoid unrelated changes

This comment is posted only once.

@github-actions github-actions Bot added bug Something isn't working help wanted Extra attention is needed labels May 16, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 16, 2026

Walkthrough

The collab server now tracks active Yjs documents and implements graceful shutdown on process signals. When receiving SIGTERM or SIGINT, the server closes the HTTP listener, notifies WebSocket clients, flushes all tracked documents to MongoDB, and exits cleanly. Error handling in the HTTP upgrade path is also reinforced.

Changes

Graceful Shutdown Flow

Layer / File(s) Summary
Active document tracking infrastructure
server/collab.ts
An in-memory Map (activeDocs) is initialized and populated during persistence binding to track live Yjs documents by name for later flushing.
Graceful shutdown and upgrade error handling
server/collab.ts
HTTP upgrade errors are caught and return 500; signal handlers for SIGTERM/SIGINT close the HTTP server, broadcast WebSocket close frames, await all document flushes to MongoDB via Promise.allSettled, and exit the process.

Sequence Diagram

sequenceDiagram
  participant System
  participant CollabServer
  participant MongoDB
  System->>CollabServer: SIGTERM/SIGINT signal
  CollabServer->>CollabServer: guard re-entrancy
  CollabServer->>CollabServer: close HTTP server
  CollabServer->>CollabServer: broadcast WebSocket close frames
  CollabServer->>MongoDB: flushDocument (all active docs)
  MongoDB-->>CollabServer: flush complete
  CollabServer->>CollabServer: exit process
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 A server that shuts down with grace,
No data lost, no hurried race,
Yjs docs flushed clean and true,
SIGTERM received, we bid adieu!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix: add graceful shutdown handler for collab server (SIGTERM/SIGINT)' directly and clearly summarizes the main change: implementing graceful shutdown handling for specific signals.
Description check ✅ Passed The PR description comprehensively addresses all required template sections: summary with what/why, correct type of change (bug fix), linked issue (#140), and validation checklist completed. All critical context is provided.
Linked Issues check ✅ Passed The implementation fulfills all primary coding requirements from issue #140: SIGTERM/SIGINT handlers, HTTP server closure, WebSocket client notification, active document tracking via Map, flushed documents to MongoDB via _mdb.flushDocument(), and idempotent shutdown logic.
Out of Scope Changes check ✅ Passed All changes are directly aligned with issue #140 objectives. The error handling extension in the HTTP upgrade path maintains existing error handling patterns and is within the scope of shutdown improvements.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added the security Security-related improvements label May 16, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
server/collab.ts (2)

35-46: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Remove inactive docs from activeDocs during writeState.

activeDocs is only appended to, so stale doc names accumulate and are treated as active at shutdown.

Suggested patch
 writeState: async (docName: string, ydoc: Y.Doc) => {
     await _mdb.flushDocument(docName);
+    activeDocs.delete(docName);
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/collab.ts` around lines 35 - 46, The writeState implementation
currently only flushes the document and never removes it from activeDocs,
causing stale entries to accumulate; update the writeState(docName: string,
ydoc: Y.Doc) function to remove the doc from activeDocs after a successful flush
(call _mdb.flushDocument(docName) and then activeDocs.delete(docName)), ensuring
you only delete the entry once the flush resolves or on the appropriate error
path so activeDocs remains consistent at shutdown.

40-42: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Await pending storeUpdate writes before flushing on shutdown.

_mdb.storeUpdate(...) is async and returns a Promise, but neither line 38 nor lines 40-42 await it. The update event handler is marked async but doesn't await the call, making the Promise fire-and-forget. During shutdown (lines 129-141), flushDocument is called immediately without waiting for any pending storeUpdate calls to settle, allowing in-flight writes to be missed.

The suggested patch addresses this by tracking pending promises in a Map<string, Set<Promise<unknown>>>, collecting each storeUpdate Promise, and awaiting their completion before calling flushDocument.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/collab.ts` around lines 40 - 42, The update handler currently
fires-and-forgets _mdb.storeUpdate(docName, update) so in-flight writes can be
lost on shutdown; modify the ydoc.on('update', async (update) => { ... })
handler to collect the Promise returned by _mdb.storeUpdate into a Map keyed by
docName (e.g., pendingWrites: Map<string, Set<Promise<unknown>>>), add the
Promise to the set when started and remove it when settled (finally), and then
update the shutdown/flush path (the flushDocument call sequence) to await
Promise.all on any pendingWrites.get(docName) (or similar) before calling
flushDocument(docName) so all storeUpdate operations complete first.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@server/collab.ts`:
- Around line 106-145: Add a forced-shutdown timeout to gracefulShutdown to
avoid hangs: when gracefulShutdown(signal) starts (after setting isShuttingDown
and before initiating server.close/wss close/flush), start a fallback setTimeout
(e.g., 10s) that forcefully calls process.exit(1) and log a warning; store the
timer id in a local variable and clear that timeout once all shutdown work
completes (after Promise.allSettled on flushPromises and successful server/ws
closes) so normal shutdown exits with process.exit(0). Reference
gracefulShutdown, server.close, wss.clients, _mdb.flushDocument, flushPromises
and isShuttingDown when adding/clearing the timeout.

---

Outside diff comments:
In `@server/collab.ts`:
- Around line 35-46: The writeState implementation currently only flushes the
document and never removes it from activeDocs, causing stale entries to
accumulate; update the writeState(docName: string, ydoc: Y.Doc) function to
remove the doc from activeDocs after a successful flush (call
_mdb.flushDocument(docName) and then activeDocs.delete(docName)), ensuring you
only delete the entry once the flush resolves or on the appropriate error path
so activeDocs remains consistent at shutdown.
- Around line 40-42: The update handler currently fires-and-forgets
_mdb.storeUpdate(docName, update) so in-flight writes can be lost on shutdown;
modify the ydoc.on('update', async (update) => { ... }) handler to collect the
Promise returned by _mdb.storeUpdate into a Map keyed by docName (e.g.,
pendingWrites: Map<string, Set<Promise<unknown>>>), add the Promise to the set
when started and remove it when settled (finally), and then update the
shutdown/flush path (the flushDocument call sequence) to await Promise.all on
any pendingWrites.get(docName) (or similar) before calling
flushDocument(docName) so all storeUpdate operations complete first.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: eb496d7b-40eb-4ebc-9347-d3cb094baffa

📥 Commits

Reviewing files that changed from the base of the PR and between 3cecd9f and 2009503.

📒 Files selected for processing (1)
  • server/collab.ts

Comment thread server/collab.ts
Comment on lines +106 to +145
async function gracefulShutdown(signal: string) {
if (isShuttingDown) {
console.log(`[collab] Already shutting down, ignoring ${signal}`);
return;
}
isShuttingDown = true;

console.log(`[collab] Received ${signal}, shutting down gracefully...`);

// 1. Stop accepting new connections
server.close((err) => {
if (err) console.error('[collab] Error closing HTTP server:', err);
});

// 2. Close all WebSocket connections with a close frame
wss.clients.forEach((client) => {
try {
client.close(1001, 'Server shutting down');
} catch {
// ignore individual close errors
}
});

// 3. Flush all active Yjs documents to MongoDB
const docNames = Array.from(activeDocs.keys());
console.log(`[collab] Flushing ${docNames.length} active document(s)...`);

const flushPromises = docNames.map(async (docName) => {
try {
await _mdb.flushDocument(docName);
console.log(`[collab] Flushed: ${docName}`);
} catch (err) {
console.error(`[collab] Failed to flush ${docName}:`, err);
}
});
await Promise.allSettled(flushPromises);

console.log('[collab] Graceful shutdown complete.');
process.exit(0);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add a forced-shutdown timeout to prevent hanging termination.

If server.close/Mongo flush stalls, this path can hang indefinitely. The linked shutdown objective explicitly calls for a timeout fallback.

Suggested patch
+const SHUTDOWN_TIMEOUT_MS = Number(process.env.COLLAB_SHUTDOWN_TIMEOUT_MS ?? 10000);

 async function gracefulShutdown(signal: string) {
   if (isShuttingDown) {
     console.log(`[collab] Already shutting down, ignoring ${signal}`);
     return;
   }
   isShuttingDown = true;
+  const forceExitTimer = setTimeout(() => {
+    console.error(`[collab] Shutdown timed out after ${SHUTDOWN_TIMEOUT_MS}ms. Forcing exit.`);
+    process.exit(1);
+  }, SHUTDOWN_TIMEOUT_MS);
+  forceExitTimer.unref();

   console.log(`[collab] Received ${signal}, shutting down gracefully...`);

   // existing shutdown steps...

   console.log('[collab] Graceful shutdown complete.');
+  clearTimeout(forceExitTimer);
   process.exit(0);
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/collab.ts` around lines 106 - 145, Add a forced-shutdown timeout to
gracefulShutdown to avoid hangs: when gracefulShutdown(signal) starts (after
setting isShuttingDown and before initiating server.close/wss close/flush),
start a fallback setTimeout (e.g., 10s) that forcefully calls process.exit(1)
and log a warning; store the timer id in a local variable and clear that timeout
once all shutdown work completes (after Promise.allSettled on flushPromises and
successful server/ws closes) so normal shutdown exits with process.exit(0).
Reference gracefulShutdown, server.close, wss.clients, _mdb.flushDocument,
flushPromises and isShuttingDown when adding/clearing the timeout.

Copy link
Copy Markdown
Owner

@piyushdotcomm piyushdotcomm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check the coderabbit review before pinging maintainers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working help wanted Extra attention is needed security Security-related improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: collab server has no graceful shutdown handler — active Yjs edits lost on SIGTERM/restart

2 participants