Skip to content

Commit 9315fb4

Browse files
Clarify Python retry and timeout scopes
Issue: zorporation/durable-workflow#490 Loop-ID: build-01
1 parent 032c2fb commit 9315fb4

6 files changed

Lines changed: 65 additions & 5 deletions

File tree

README.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,17 @@ For a fuller deployable example, see
5454
[`examples/order_processing`](examples/order_processing), which runs a
5555
multi-activity order workflow against a local server with Docker Compose.
5656

57+
## Retry policy scopes
58+
59+
Retry and timeout settings are scoped to the layer where you configure them:
60+
61+
- `TransportRetryPolicy` on `Client(...)` retries SDK HTTP requests only. It handles transient connection failures, request timeouts, 5xx responses, and 429 rate limits. It does not retry workflow code, activity code, child workflows, or failed workflow runs.
62+
- `ActivityRetryPolicy` on `ctx.schedule_activity(...)` is recorded into durable history with that activity command. It controls server-side attempts for that one activity execution.
63+
- `ChildWorkflowRetryPolicy` on `ctx.start_child_workflow(...)` is recorded with that child-start command. It controls server-side attempts for that child workflow execution.
64+
- `non_retryable_error_types` belongs to durable activity/child retry policies. `non_retryable=True` on an activity failure bypasses the activity retry budget and surfaces the failure to the workflow.
65+
66+
Timeout names are also layer-specific. `start_to_close_timeout` limits one activity attempt, `schedule_to_start_timeout` limits queue wait before an activity starts, `schedule_to_close_timeout` limits the whole activity execution including retries, and `heartbeat_timeout` limits the gap between activity heartbeats. For child workflows, `execution_timeout_seconds` covers the overall child workflow execution and `run_timeout_seconds` covers one run.
67+
5768
## Activity retries and timeouts
5869

5970
Configure per-call activity retries and deadlines from workflow code:

docs/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ pip install 'durable-workflow[prometheus]'
2121
- **[Workflow](reference/workflow.md)** — workflow-side primitives: `ActivityRetryPolicy`, `ChildWorkflowRetryPolicy`, `ContinueAsNew`, `StartChildWorkflow`, signal/query/update decorators, and query-state replay helpers.
2222
- **[Activity](reference/activity.md)** — activity decorator and execution context.
2323
- **[Errors](reference/errors.md)** — typed exceptions raised by the client and worker.
24-
- **[Retry policy](reference/retry_policy.md)** — HTTP transport retry configuration for the client.
24+
- **[Retry policy](reference/retry_policy.md)** — HTTP transport retry configuration for the client. Durable activity and child workflow retry policies are workflow primitives, not transport settings.
2525
- **[Metrics](reference/metrics.md)** — pluggable recorders, including a Prometheus adapter.
2626
- **[Serializer](reference/serializer.md)** — payload encoding and decoding helpers.
2727
- **[Sync helpers](reference/sync.md)** — blocking wrappers around the async client for scripts and tests.

docs/reference/retry_policy.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,17 @@
11
# Retry policy
22

3+
`TransportRetryPolicy` is the client HTTP retry policy. It retries transient
4+
network/request failures while talking to the Durable Workflow server; it does
5+
not retry workflow runs, workflow tasks, activities, child workflows, or user
6+
code.
7+
8+
Durable retry budgets live on workflow commands instead:
9+
10+
- `ActivityRetryPolicy` with `ctx.schedule_activity(...)` controls attempts for
11+
one activity execution.
12+
- `ChildWorkflowRetryPolicy` with `ctx.start_child_workflow(...)` controls
13+
attempts for one child workflow execution.
14+
- Activity timeout fields and child workflow timeout fields are server-side
15+
durable budgets, not SDK HTTP request timeouts.
16+
317
::: durable_workflow.retry_policy

docs/reference/workflow.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,8 @@
11
# Workflow
22

3+
Workflow retry and timeout settings are durable command budgets. Use
4+
`ActivityRetryPolicy` on `ctx.schedule_activity(...)` for activity attempts and
5+
`ChildWorkflowRetryPolicy` on `ctx.start_child_workflow(...)` for child workflow
6+
attempts. Use `TransportRetryPolicy` only for client HTTP retries.
7+
38
::: durable_workflow.workflow

src/durable_workflow/retry_policy.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,13 @@ class TransportRetryPolicy:
3232
timeouts, 5xx server errors, 429 rate limit). Does not retry client
3333
errors (4xx except 429).
3434
35+
This policy runs inside :class:`~durable_workflow.Client` around HTTP
36+
requests. It does not retry workflow runs, workflow tasks, activity
37+
executions, child workflows, or any user code. Configure durable activity
38+
retries with :class:`durable_workflow.workflow.ActivityRetryPolicy` and
39+
child workflow retries with
40+
:class:`durable_workflow.workflow.ChildWorkflowRetryPolicy`.
41+
3542
Uses exponential backoff with jitter to avoid thundering herd.
3643
"""
3744

src/durable_workflow/workflow.py

Lines changed: 27 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -169,7 +169,12 @@ class ActivityRetryPolicy:
169169
170170
The policy is snapped onto the durable activity execution when the
171171
workflow task completes, so later code deploys do not change the retry
172-
budget for an already-scheduled activity.
172+
budget for an already-scheduled activity. It is a server-side durable
173+
retry policy, not the SDK HTTP transport retry policy.
174+
175+
``non_retryable_error_types`` names failure types that should bypass this
176+
retry budget. An activity worker can also report ``non_retryable=True`` on
177+
a failure to stop retrying that activity execution.
173178
"""
174179

175180
max_attempts: int = 3
@@ -249,15 +254,28 @@ def _payload_warning_context(
249254

250255
@dataclass
251256
class ChildWorkflowRetryPolicy(ActivityRetryPolicy):
252-
"""Retry policy applied to one started child workflow call."""
257+
"""Retry policy applied to one started child workflow call.
258+
259+
This is recorded with the child workflow command and controls durable
260+
server-side child attempts. It is separate from SDK HTTP transport retry
261+
and from activity retry.
262+
"""
253263

254264

255265
ChildWorkflowRetryPolicyInput = ChildWorkflowRetryPolicy | ActivityRetryPolicy | Mapping[str, Any]
256266

257267

258268
@dataclass
259269
class ScheduleActivity:
260-
"""Command requesting an activity task."""
270+
"""Command requesting an activity task.
271+
272+
Timeout fields are activity budgets, not HTTP request timeouts:
273+
``start_to_close_timeout`` limits one activity attempt,
274+
``schedule_to_start_timeout`` limits queue wait before an attempt starts,
275+
``schedule_to_close_timeout`` limits the whole activity execution including
276+
retries, and ``heartbeat_timeout`` limits the gap between activity
277+
heartbeats.
278+
"""
261279

262280
activity_type: str
263281
arguments: list[Any]
@@ -512,7 +530,12 @@ def to_server_command(
512530

513531
@dataclass
514532
class StartChildWorkflow:
515-
"""Command requesting a child workflow run."""
533+
"""Command requesting a child workflow run.
534+
535+
``execution_timeout_seconds`` limits the overall child workflow execution.
536+
``run_timeout_seconds`` limits one child run. These budgets are durable
537+
server-side workflow budgets and are separate from client HTTP timeouts.
538+
"""
516539

517540
workflow_type: str
518541
arguments: list[Any] = field(default_factory=list)

0 commit comments

Comments
 (0)