Skip to content

ds4_server: check if client disconnected to abort inference early#242

Open
confiden wants to merge 1 commit into
antirez:mainfrom
confiden:client-disconnected
Open

ds4_server: check if client disconnected to abort inference early#242
confiden wants to merge 1 commit into
antirez:mainfrom
confiden:client-disconnected

Conversation

@confiden
Copy link
Copy Markdown

Title

Detect client disconnection during non-streaming inference and abort generation early to allow the next queued worker to begin

Description

Problem

The ds4-server uses a single-worker-thread architecture. When a client disconnects mid-request (drops their HTTP connection), the worker continues generating tokens until the response is fully complete before noticing the failure. For non-streaming requests, there is no interaction with the client socket during the decode loop, so a dropped connection is never detected until the final send(). This wastes GPU time an more importantly, blocks the single worker thread from processing any subsequent queued requests.

Solution

Add a lightweight, per-token health check on the client socket inside the decode loop for non-streaming requests. Before sampling each tokepoll() the fd with a zero timeout. If the peer has closed the connection, abort generation immediately with finish = "error".

Cross-platform notes

  • On Linux, POLLHUP (and POLLRDHUP with _GNU_SOURCE) is raised for a graceful TCP FIN, so the check is straightforward.
  • On macOS, POLLHUP is not raised for a TCP FIN �~@~T the socket becomes readable instead, and recv() returns 0. The implementation handles bot paths: it checks POLLHUP | POLLERR | POLLNVAL first, and if only POLLIN is set, it peeks with recv(fd, ..., MSG_PEEK | MSG_DONTWAIT) to detect EOF.

Changes

  • ds4_server.c: ~28 lines added in the decode loop of generate_job(), guarded by if (!j->req.stream) (streaming requests already detect disconnection via write failures).
  • No new includes, no new build flags, no new dependencies.

Testing

  • Compiles cleanly on both macOS (clang) and Linux (gcc).
  • Manual test performed: send a non-streaming request and kill the client mid-generation the server logs "ds4-server: client disconnected during generation, aborting job" and immediately picks up the next queued job.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant