ds4-server: watchdog thread + decode-loop SSE keepalive by Allen091080 · Pull Request #238 · antirez/ds4

Allen091080 · 2026-05-24T05:43:53Z

Summary

Builds on the SSE keepalive merged in f027269 (PR #194). That patch
covered prefill silence; this one covers the other two failure modes
on the same surface:

Worker thread stalls in GPU/Metal kernel calls with no
cancellation points. SIGTERM cannot drain a stuck turn — we observed
a 12+ hour ds4-server process wedged on a single chat request,
requiring SIGKILL.
Decode-loop silence during reasoning-only stretches (<think>...)
or slow tool-input phases: the prefill keepalive does not fire, the
decode loop produces no streamable bytes for a while, and the client
TCP idle-timeout closes the connection.

Fix

Two cooperating pieces, both in ds4_server.c:

A) Watchdog thread

A dedicated watchdog_main polls worker_last_progress every 5s
while worker_in_job is set.

Soft stall (default 60s): set worker_abort_requested so the
decode loop bails on its next iteration; the client gets an error
finish_reason and the worker continues serving future jobs.
Hard stall (default 120s): call _exit(137). The launchd
KeepAlive supervisor restarts immediately; on-disk KV cache
survives so the restarted process can usually resume the client's
last prefix from cache instead of re-prefilling.

B) Decode-loop additions

The decode loop now:

Checks worker_abort_requested at the top of each iteration, breaks
with a clear finish=error when the watchdog has requested abort.
Emits a : decode\n\n SSE comment line at most every 15s when
j->req.stream is set (matches the prefill keepalive cadence).
Refreshes worker_last_progress after each produced token via
relaxed __atomic_store_n.

server_progress_cb also refreshes worker_last_progress on every
prefill_chunk callback — without this, big prefills (≥ soft threshold)
would be mistaken for stalls. Chunks fire every ~7s in practice and
reset the timestamp well within the 60s soft limit, so prefill is
never falsely aborted.

State additions

pthread_t worker_tid;
pthread_t watchdog_tid;
bool watchdog_running;
int watchdog_stuck_soft_s;  // default 60
int watchdog_stuck_hard_s;  // default 120

// relaxed __atomic_* accessed across threads
volatile long worker_last_progress;
volatile int worker_in_job;
volatile int worker_abort_requested;

The watchdog only acts while worker_in_job is set — idle workers
don't advance progress and we must not misinterpret that as a stall.

Verification

Machine: MacBook Pro M5 Max, 128 GiB RAM
Backend: Metal
Model: DeepSeek-V4-Flash-IQ2XXS-w2Q2K-AProjQ8-SExpQ8-OutQ8-chat-v2.gguf (q2-imatrix)
Server: ./ds4-server --host 0.0.0.0 --port 8000 --ctx 500000 --kv-disk-dir … --kv-disk-space-mb 204800

make — clean build, no new warnings.
./ds4_test --server — server: OK / ds4 tests: ok.
Real running ds4-server protected: in an earlier multi-hour run we
saw a hard wedge that needed SIGKILL. With the patch applied, a
follow-up run hit the soft trigger (WATCHDOG soft stall 62s >= 60s — requesting decode-loop abort) and the client got a clean error
within 70s instead of hanging indefinitely. launchd KeepAlive
picked the process back up.
35s decode burst with reasoning: client receives : decode comment
lines every ~15s, no client disconnects.

Test plan

CI runs ./ds4_test --server
Manual: run a heavy chat request long enough to provoke a real
or simulated stall (e.g. add a temporary sleep in ds4_session_eval)
and confirm the soft trigger fires within ~65s and _exit(137)
fires within ~125s.
Confirm kill -15 still drains cleanly via the existing
g_stop_requested shutdown path — the watchdog is purely
additive.

The inference worker can stall in the underlying GPU/Metal kernel calls (`ds4_session_sample` / `ds4_session_eval` / `ds4_session_eval_speculative_argmax`). Those calls have no cancellation points, so SIGTERM cannot drain a stuck turn — we have seen 12+ hour ds4-server processes wedged with a single chat request in flight, requiring SIGKILL to recover. The SSE keepalive that landed earlier covers prefill but not decode, so once the worker gets past prefill and enters decode, a Metal stall is again invisible to the client until its own idle timeout. Add a dedicated `watchdog_main` thread that polls `worker_last_progress` while `worker_in_job` is set. Two thresholds: soft stall (default 60s) — set `worker_abort_requested` so the decode loop bails on its next iteration; the in-flight HTTP client gets an `error` finish_reason and the worker continues to pick up future jobs. hard stall (default 120s) — call `_exit(137)`. The launchd KeepAlive supervisor restarts us immediately; the on-disk KV cache survives so the restarted process can usually resume the client's last prompt prefix from cache instead of re-prefilling from token zero. The decode loop now: - checks `worker_abort_requested` at the top of each iteration and breaks with a clear `finish=error` reason when the watchdog has requested an abort; - emits a `: decode\n\n` SSE comment line every 15 seconds when `j->req.stream` is set, mirroring the prefill keepalive from the earlier patch so reasoning-only stretches (`<think>...`) or slow tool-input phases don't trip client TCP idle-timeouts either; - refreshes `worker_last_progress` (relaxed atomic store) after each produced token so the watchdog can distinguish "alive but slow" from "wedged". `server_progress_cb` also refreshes `worker_last_progress` on every `prefill_chunk` event. Without this, very long prefills (≥ soft threshold for big prompts) would be mistaken for Metal stalls — chunks fire every ~7s in practice and reset the timestamp before the 60s soft limit, so prefill is never falsely aborted. Field layout: - `worker_tid` / `watchdog_tid` / `watchdog_running` / thresholds - 3 cross-thread atomics: `worker_last_progress`, `worker_in_job`, `worker_abort_requested` (relaxed __atomic_* loads/stores) The watchdog only acts while `worker_in_job` is set — idle workers don't advance progress and we must not interpret that as a stall. Setup-time misconfigurations (model load failure, port conflict) are not covered; they fail synchronously before any job is ever dequeued. Verified on macOS Metal, q2-imatrix GGUF, ctx=200000: - `make` clean build, no new warnings - `./ds4_test --server` passes - Real running ds4-server protected: in earlier 12+ hour run we saw a hard wedge that needed SIGKILL; with this patch a follow-up run hit the soft trigger ("WATCHDOG soft stall 62s >= 60s — requesting decode-loop abort") and the client got a clean error within 70s instead of hanging indefinitely. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

antirez · 2026-05-24T17:37:01Z

Fixed more organically in recent commits. See the original issue for more info. Thanks.

Allen091080 · 2026-05-25T05:12:22Z

Thanks for the quick reply! Agreed — f91c12b's prefill_display event is the right shape for prefill stalls and lines up better with the project's minimalism.

Closing this PR. I'll send a much smaller follow-up that keeps only the decode-loop SSE keepalive piece (no watchdog thread, no _exit), since long-thinking and long tool-input phases during decode produce no streamable bytes for a while either and the current prefill keepalive doesn't cover those.

Allen091080 closed this May 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ds4-server: watchdog thread + decode-loop SSE keepalive#238

ds4-server: watchdog thread + decode-loop SSE keepalive#238
Allen091080 wants to merge 1 commit into
antirez:mainfrom
Allen091080:watchdog-decode-keepalive

Allen091080 commented May 24, 2026

Uh oh!

antirez commented May 24, 2026

Uh oh!

Allen091080 commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Allen091080 commented May 24, 2026

Summary

Fix

A) Watchdog thread

B) Decode-loop additions

State additions

Verification

Test plan

Uh oh!

antirez commented May 24, 2026

Uh oh!

Allen091080 commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants