Summary
The Copilot CLI host (github/github-app) had to work around a deserialization failure in our SDK's typed ping() API on the session-resume / warm-CLI-pool / liveness-probe paths. See github/github-app#5461 for the workaround.
Their fix introduces a ping_cli_compat(&client) helper that drops to the raw client.call("ping", json!({})) so the result body is never deserialized — they only care whether the JSON-RPC round trip succeeded.
We should give them (and everyone doing liveness checks) a first-class primitive instead of forcing them to bypass our typed API.
Root cause
Across all five SDKs, ping() deserializes the response into a typed body:
The authoritative PingResult schema (@github/copilot/schemas/api.schema.json) requires message, timestamp (date-time string), and protocolVersion (integer > 0). The Rust hand-written PingResponse softens this with #[serde(default)] and Option<u32>, but that only helps for missing fields — wrong types (e.g. null timestamp, integer-shape drift) still fail. Same brittleness exists across the other SDKs.
ping is used by hosts as a liveness check (warm-pool reuse, resumed-session aliveness, retrier "is the cached client still alive" probes). These paths straddle CLI version boundaries — a resumed older CLI process can answer with a slightly different ping body shape, and the entire liveness check fails even though the RPC itself succeeded.
This is exactly the wrong failure mode for a health check: the consumer asked "is the CLI reachable?" and we answered "no" because of a body-shape mismatch.
Proposed fix
Add a dedicated liveness-probe API to every SDK Client:
- Sends the
ping JSON-RPC call.
- Returns success based solely on JSON-RPC success — never deserializes the result body.
- Composable with caller-supplied timeouts — no baked-in timeout, so each host (startup probe, warm-pool, resume probe, background keepalive) sets its own budget.
ping() and generated rpc.ping() stay strict and schema-faithful. Callers who actually want the typed data keep getting it; schema drift continues to surface there as a real error.
Per-language names:
| SDK |
API |
| Rust |
Client::is_alive(&self) -> bool |
| Node |
client.isAlive(): Promise<boolean> |
| Python |
client.is_alive() -> bool |
| Go |
client.IsAlive(ctx context.Context) bool |
| .NET |
client.IsAliveAsync(ct) : Task<bool> |
After this lands, the github/github-app workaround helper goes away and call sites become e.g. existing.is_alive().await.
Rejected alternatives
- Loosen
ping() itself. Considered and rejected. ping() is a typed schema-backed API; if it silently swallows malformed bodies, the contract becomes ambiguous (did the caller want liveness, or the ping data?). It also masks real CLI/schema drift in the API most likely to catch it.
- Loosen the generated
rpc.ping(PingRequest). Same reasoning, more so — generated APIs must remain schema-faithful. is_alive is the explicit escape hatch.
- "Fix it only in the CLI." The CLI should still honor the schema, and we should investigate any drift. But liveness checks fundamentally straddle version boundaries, so the SDK needs a body-agnostic primitive regardless.
Acceptance criteria
Design review
Reviewed and agreed with GPT-5.5; consensus on:
- Separate
is_alive API (not loosening ping).
- Keep generated
rpc.ping strict.
- No baked-in timeout; caller composes timeouts.
- Name
is_alive clearly communicates intent ("RPC round trip succeeded") and beats ping_raw / ping_check.
Summary
The Copilot CLI host (
github/github-app) had to work around a deserialization failure in our SDK's typedping()API on the session-resume / warm-CLI-pool / liveness-probe paths. See github/github-app#5461 for the workaround.Their fix introduces a
ping_cli_compat(&client)helper that drops to the rawclient.call("ping", json!({}))so the result body is never deserialized — they only care whether the JSON-RPC round trip succeeded.We should give them (and everyone doing liveness checks) a first-class primitive instead of forcing them to bypass our typed API.
Root cause
Across all five SDKs,
ping()deserializes the response into a typed body:Client::ping(&self) -> Result<PingResponse, Error>(rust/src/lib.rs:1676)client.ping()returning{message, timestamp, protocolVersion?}(nodejs/src/client.ts:1032)client.Ping(ctx, msg) (*PingResponse, error)(go/client.go:1317)client.PingAsync(...) : Task<PingResponse>(dotnet/src/Client.cs:874)The authoritative
PingResultschema (@github/copilot/schemas/api.schema.json) requiresmessage,timestamp(date-time string), andprotocolVersion(integer > 0). The Rust hand-writtenPingResponsesoftens this with#[serde(default)]andOption<u32>, but that only helps for missing fields — wrong types (e.g.nulltimestamp, integer-shape drift) still fail. Same brittleness exists across the other SDKs.pingis used by hosts as a liveness check (warm-pool reuse, resumed-session aliveness, retrier "is the cached client still alive" probes). These paths straddle CLI version boundaries — a resumed older CLI process can answer with a slightly differentpingbody shape, and the entire liveness check fails even though the RPC itself succeeded.This is exactly the wrong failure mode for a health check: the consumer asked "is the CLI reachable?" and we answered "no" because of a body-shape mismatch.
Proposed fix
Add a dedicated liveness-probe API to every SDK
Client:pingJSON-RPC call.ping()and generatedrpc.ping()stay strict and schema-faithful. Callers who actually want the typed data keep getting it; schema drift continues to surface there as a real error.Per-language names:
Client::is_alive(&self) -> boolclient.isAlive(): Promise<boolean>client.is_alive() -> boolclient.IsAlive(ctx context.Context) boolclient.IsAliveAsync(ct) : Task<bool>After this lands, the
github/github-appworkaround helper goes away and call sites become e.g.existing.is_alive().await.Rejected alternatives
ping()itself. Considered and rejected.ping()is a typed schema-backed API; if it silently swallows malformed bodies, the contract becomes ambiguous (did the caller want liveness, or the ping data?). It also masks real CLI/schema drift in the API most likely to catch it.rpc.ping(PingRequest). Same reasoning, more so — generated APIs must remain schema-faithful.is_aliveis the explicit escape hatch.Acceptance criteria
is_alive/IsAlive/isAlivemethod on the Rust, Node, Python, Go, and .NETClienttypes.pingJSON-RPC method and returns success purely on RPC success, ignoring the response body.ping()/rpc.ping()typed APIs unchanged.is_alivefor liveness/warm-pool/resume checks" from "useping()when you want the typed response."pingbody that fails strict deserialization but is otherwise a valid JSON-RPC success —is_alivereturnstrue,ping()returns an error.Design review
Reviewed and agreed with GPT-5.5; consensus on:
is_aliveAPI (not looseningping).rpc.pingstrict.is_aliveclearly communicates intent ("RPC round trip succeeded") and beatsping_raw/ping_check.