Skip to content

Add ClickHouse query timeout and graceful client disconnect handling #137

@scotwells

Description

@scotwells

Summary

The activity-apiserver lacks server-side query timeouts and does not handle client disconnects gracefully, leading to resource waste and misleading error logging.

What needs to happen

1. ClickHouse query timeout

Add max_execution_time to ClickHouse query options so individual queries that scan too much data are killed server-side rather than holding concurrency slots indefinitely.

2. Handle client disconnects gracefully

When a client disconnects mid-query (context.Canceled), the apiserver currently logs it as an Unhandled Error and returns a 504. This should be distinguished from real storage errors:

  • Log as a debug/info event, not an error
  • Track with a separate metric (e.g., activity_clickhouse_query_errors_total{error_type="client_canceled"})
  • Don't return 504 for client-initiated cancellations

The relevant code path is in internal/storage/clickhouse.go at the rows.Err() check.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions