Skip to content

Add DurableFuture#or_timeout + Restate::TimeoutError#12

Open
junyuanz1 wants to merge 3 commits into
restatedev:mainfrom
junyuanz1:add-or-timeout
Open

Add DurableFuture#or_timeout + Restate::TimeoutError#12
junyuanz1 wants to merge 3 commits into
restatedev:mainfrom
junyuanz1:add-or-timeout

Conversation

@junyuanz1
Copy link
Copy Markdown

@junyuanz1 junyuanz1 commented May 21, 2026

Summary

Brings the Ruby SDK to feature parity with the TypeScript and Java SDKs for the race-a-future-against-a-deadline use case. Today Ruby users have to hand-roll Restate.sleep + Restate.wait_any + completed? branching at every call site — verbose and easy to get wrong.

Peer SDK references

SDK Method File:line
TypeScript RestatePromise.orTimeout(duration) packages/libs/restate-sdk/src/promises.ts#L155-L172
TypeScript class TimeoutError extends TerminalError packages/libs/restate-sdk/src/types/errors.ts#L121-L126
TypeScript TIMEOUT_ERROR_CODE = 408 / CANCEL_ERROR_CODE = 409 packages/libs/restate-sdk/src/types/errors.ts#L17-L18
Java DurableFuture.withTimeout(Duration) / await(Duration) sdk-api/.../DurableFuture.java#L67-L84
Java class TimeoutException extends TerminalException sdk-common/.../TimeoutException.java#L11-L14

Docs context: Concurrent tasks (TS) · Durable timers (TS) · Competitive Racing pattern

Changes

  1. Restate::TimeoutError — subclass of Restate::TerminalError. Default message "Timeout occurred", HTTP status 408. Inheriting from TerminalError lets the existing rescue Restate::TerminalError idiom catch timeouts uniformly, matching TS's TimeoutError extends TerminalError and Java's TimeoutException extends TerminalException.

  2. DurableFuture#or_timeout(duration) — races against Restate.sleep. Returns the future's value on win; raises TimeoutError if the sleep wins. Shape matches TS's orTimeout.

  3. DurableCallFuture#or_timeout(duration) — overrides the base to also #cancel the remote invocation when the timeout wins, so the callee doesn't keep running after the caller gave up. This goes beyond what TS and Java do today — neither TS's BaseRestatePromise.orTimeout nor Java's DurableFuture.withTimeout cancels the work future on timeout; both leave the remote call running.

    This felt like a correctness win — leaving the callee running after the caller has given up wastes resources and produces spurious results that nobody is waiting for. But happy to drop the override if maintainers prefer strict TS/Java parity (1-line revert + spec adjustment).

  4. spec/or_timeout_spec.rb — 8 RSpec examples covering happy/timeout paths on both future types plus error-class invariants. Stubs Restate.sleep/Restate.wait_any so the spec runs without a live VM, matching spec/server_context_outbound_middleware_spec.rb.

  5. docs/USER_GUIDE.md — new "Timeouts" subsection right after "Sleep". Documents the API + the orphan-sleep caveat + the cancellable-deadline workaround pattern.

Design decisions

Inherits from TerminalError (not a fresh exception hierarchy)

Mirrors the TS hierarchy (TimeoutError extends TerminalError extends RestateError) and the Java one (TimeoutException extends TerminalException). Keeps the existing rescue Restate::TerminalError discipline working for users who don't care which terminal flavor they hit.

Method name or_timeout (matches TS) vs Java's withTimeout

There's actual divergence between peer SDKs here:

Picked or_timeout (snake-case of TS's orTimeout) because:

  • It reads naturally in Ruby (future.or_timeout(5) → "or, timeout after 5s")
  • TS's method name is the more widely-used reference in the docs (e.g. Durable Timers)
  • with_timeout would suggest a chainable builder, which doesn't match the imperative flow we want here

Happy to switch to with_timeout if maintainers prefer aligning with Java; happy to add both as aliases.

HTTP status code 408 (not 409)

408 (Request Timeout) is the correct HTTP semantic for a timeout and matches what TS ships:

// packages/libs/restate-sdk/src/types/errors.ts#L17-L18
export const TIMEOUT_ERROR_CODE = 408;
export const CANCEL_ERROR_CODE  = 409;

Java's TimeoutException uses 409, but 409 is what TS reserves for CancelledError — Java appears to be the outlier and the choice there looks like a copy-paste from CancelledException. Picking 408 keeps Ruby aligned with both standard HTTP semantics and TS.

(Happy to flip to 409 if maintainers prefer matching Java — just flagging the cross-SDK divergence.)

Known limitation (documented, not fixed in this PR)

When the work future wins the race, the sleep handle remains in the journal because restate-sdk-shared-core 0.7.0 exposes no sys_cancel_handle primitive — only sys_cancel_invocation for a separate invocation. The wake-up is a no-op against a completed handler but keeps the invocation row alive in Restate's state until the timer fires.

Both peer SDKs have the same footprint:

  • TypeScript's orTimeout uses raw ctx.sleep inside the combinator with no cancellation.
  • Java's withTimeout uses ctx.timer(timeout, null), same VM primitive, same no-cancel.

This PR matches that behavior 1:1 and documents the workaround (cancellable-deadline pattern via a separate scheduled invocation + SendHandle#cancel) in the user guide.

Two reasonable follow-ups, out of scope here:

  1. Add a #with_cancellable_deadline helper to the Ruby SDK that bundles a tiny DeadlineTrigger service.
  2. Raise the gap against restate-sdk-shared-core for a sys_cancel_handle primitive — which would let every SDK fix the leak at the source.

Happy to do either as a separate PR / issue if there's interest.

Test plan

  • bundle exec rspec82 examples, 0 failures
  • Spec runs without a live VM (CI-friendly)
  • bundle exec rake compile clean on arm64-darwin23
  • Maintainer review of the design choices above
  • (Optional) test-services integration coverage if maintainers want it

junyuanz1 added 2 commits May 21, 2026 10:30
Brings the Ruby SDK to feature parity with the TypeScript and Java
SDKs for the "race a future against a deadline" use case. Today the
Ruby SDK has no direct equivalent of:

* TypeScript: +RestatePromise.orTimeout(duration)+ →
  https://github.com/restatedev/sdk-typescript/blob/main/packages/libs/restate-sdk/src/promises.ts
* Java:       +Awaitable.orTimeout(Duration)+ →
  https://github.com/restatedev/sdk-java/blob/main/sdk-common/src/main/java/dev/restate/sdk/common/TimeoutException.java

Ruby users currently have to hand-roll +Restate.sleep+ +
+Restate.wait_any+ + +completed?+ branching at every call site,
which is verbose and easy to get wrong (especially around when to
.cancel the call invocation on timeout).

Changes:

* +Restate::TimeoutError+ — subclass of +Restate::TerminalError+,
  default message "Timeout occurred", HTTP status 408. Inheriting
  from TerminalError lets the existing
  +rescue Restate::TerminalError+ idiom in user handlers catch
  timeouts uniformly with other terminal failures, matching the TS
  type hierarchy (+TimeoutError extends TerminalError+ in
  +types/errors.ts+).

* +DurableFuture#or_timeout(duration)+ — race against
  +Restate.sleep+. Returns the future's value on win; raises
  +TimeoutError+ if the sleep wins.

* +DurableCallFuture#or_timeout(duration)+ — refines the base to
  call +#cancel+ on the remote invocation when the timeout wins,
  so the callee doesn't keep running after the caller gave up.
  Same refinement TS makes — see the +InvocationPromise+
  specialization in TS that calls +ctx.cancel(invocationId)+ on
  timeout.

* RSpec coverage at +spec/or_timeout_spec.rb+ — 8 examples covering
  happy/timeout paths on both future types plus error-class
  invariants. Stubs +Restate.sleep+/+Restate.wait_any+ so the spec
  runs without a live VM, matching the existing
  +server_context_outbound_middleware_spec.rb+ style.

* +docs/USER_GUIDE.md+ "Timeouts" subsection with the usage
  pattern and a documented caveat about the orphan-sleep footprint
  (see "Design notes" below).

== Why HTTP status 408 (not 409)

408 (Request Timeout) is the correct HTTP semantic for a timeout
and matches the TypeScript SDK
(+packages/libs/restate-sdk/src/types/errors.ts+):

    export const TIMEOUT_ERROR_CODE = 408;
    export const CANCEL_ERROR_CODE  = 409;

The Java SDK's +TimeoutException+ uses 409, but 409 is what TS
reserves for +CancelledError+ — Java appears to be the outlier and
the choice there looks like a copy-paste from CancelledException.
Picking 408 here keeps the Ruby SDK aligned with both standard
HTTP semantics and the larger TS ecosystem.

== Design note: the orphan-sleep footprint

Both this implementation and the existing TS +orTimeout+ have the
same property: when the work future wins the race, the sleep
handle remains in the journal because +restate-sdk-shared-core+
0.7.0 exposes no +sys_cancel_handle+ primitive. The wake-up is a
no-op against a completed handler but keeps the invocation row
alive in Restate's state until the timer fires — meaningful on
long deadlines.

The TS implementation has this footprint too (see
+packages/libs/restate-sdk/src/promises.ts+'s +orTimeout+, which
uses raw +ctx.sleep+ inside the combinator). This PR matches that
behavior 1:1 and documents the caveat + the workaround
(cancellable-deadline pattern via a separate scheduled invocation
+ +SendHandle#cancel+) in the user guide. A follow-up could
either:

* Add a +#with_cancellable_deadline+ helper that routes the timer
  through a small bundled +DeadlineTrigger+ service, or
* Raise the gap against +restate-sdk-shared-core+ for a real
  +sys_cancel_handle+ primitive — which would let every SDK fix
  the leak at the source.

Out of scope for this PR.

Test results: +bundle exec rspec+ — 82 examples, 0 failures.
Fills in the four remaining surfaces that callers of the new API
touch:

* +sig/restate.rbs+ — adds +TimeoutError < TerminalError+ and
  +DurableFuture#or_timeout+ / +DurableCallFuture#or_timeout+
  signatures alongside the existing ones. +bundle exec steep check+
  passes.

* +docs/USER_GUIDE.md+ — adds a +TimeoutError+ subsection inside
  +## Error Handling+ so the +rescue Restate::TimeoutError+ pattern
  is discoverable from the canonical error docs, not just from the
  +Timeouts+ subsection. Also adds +or_timeout+ to the
  +service_communication.rb+ row in the examples-mapping table so
  the table stays accurate.

* +docs/INTERNALS.md+ — extends the Durable Futures section so the
  +or_timeout+ method shows up on both +DurableFuture+ and
  +DurableCallFuture+ alongside the existing +cancel+ docs. Notes
  the orphan-handle footprint at the same source-of-truth as the
  rest of the future internals.

* +examples/service_communication.rb+ — adds a +with_deadline+
  handler that demonstrates +Worker.call.process(task).or_timeout(5)+
  with a +rescue Restate::TimeoutError+ block, so the example
  matches the entry now listed in the user-guide table.

No code or behavior changes — pure docs/sig fill-in.

Test results: +bundle exec rspec+ — 82 examples, 0 failures.
+bundle exec steep check+ — no type errors.
Copy link
Copy Markdown
Contributor

@igalshilman igalshilman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution @junyuanz1

couple of quick notes:

  • re auto cancelation - I think that this is very much use case by use case specific. and some would find that surprising. For example you might be `racing in a loop between multiple calls, you might find it surprising that the losing calls were canceled.
  • regarding or_timeout in typescript and others, this itself returns a DurableFuture which can be combined later.
    such functionality does not exist yet, therefore Restate.any / Restate.race is a top level blocking operation.
  • perhaps it will be simpler to do Restate.with_timeout( durable future , timeout ) ?

We'd need to make these futures combinable but it is a slightly more involved task.

Comment thread lib/restate/durable_future.rb
Comment thread lib/restate/durable_future.rb Outdated
Match TS RestatePromise.orTimeout / Java DurableFuture.withTimeout:
the timer firing raises Restate::TimeoutError without cancelling the
underlying call. Callers who want the remote invocation stopped
rescue the error and invoke #cancel themselves.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@junyuanz1 junyuanz1 requested a review from igalshilman May 22, 2026 00:53
Copy link
Copy Markdown
Contributor

@igalshilman igalshilman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thank you.
will merge and release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants