Skip to content

feat: WebSocket reliability for iOS PWA (heartbeat + visibilitychange reconnect) #88

@rado0x54

Description

@rado0x54

Problem

The browser WS client (client/src/lib/stores/ws.ts) has no liveness detection and no awareness of tab/PWA lifecycle:

  • No application-level heartbeat (ping/pong). A silently broken WS (NAT timeout, mobile radio handoff, iOS suspending the PWA) is not detected until the next outbound message — which for an observer never comes.
  • onclose triggers a 2 s reconnect (ws.ts:121), but iOS Safari frequently does not fire onclose when the PWA is backgrounded — the socket just stops delivering frames.
  • No document.visibilitychange listener to force a reconnect when the user returns to the app.

Net effect on iPhone PWA: backgrounding the app for more than a minute or two often leaves the terminal view connected-but-dead. Combined with #87 (stale attach replay), even a successful manual refresh looks broken.

Proposed fix

In client/src/lib/stores/ws.ts:

  1. Heartbeat. Send a { type: \"ping\" } every ~25 s; expect a { type: \"pong\" } back within ~10 s. If no pong, treat the socket as dead — close it and trigger reconnect immediately. Server side: handle ping in src/server/ws-message-router.ts and reply with pong (cheap, no per-session state). 25 s is short enough to beat most NAT timeouts and iOS background grace periods.
  2. Visibility-aware reconnect. Add a document.visibilitychange listener: when transitioning to visible and the WS is not OPEN, kick off a reconnect immediately (don't wait for the 2 s backoff timer). Optionally also force-close-and-reopen on visible if the last pong is older than the heartbeat interval — covers the iOS "socket appears open but is actually dead" case.
  3. Reconnect backoff. While we're in there, replace the fixed 2 s with light exponential backoff (e.g. 1 s → 2 s → 4 s → 8 s, cap 30 s, reset on successful open). Avoids hammering the server when it's actually down.

Acceptance

  • Backgrounding the iPhone PWA for several minutes and returning shows a live terminal again within a couple of seconds, without a manual refresh.
  • Killing the server's TCP connection mid-session (e.g. proxy restart) is detected within ~35 s and triggers a reconnect.
  • The heartbeat is robust to the existing PendingAction WS extension (sign-request channel uses the same /ws) — no message-type collisions.

Out of scope

Origin

Surfaced while testing iPhone PWA end-to-end. The reliability symptom and the stale-snapshot symptom (#87) compounded each other and were initially hard to separate.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions