feat: add exponential backoff retry for transient provider errors by ramparte · Pull Request #9 · microsoft/amplifier-module-loop-streaming

ramparte · 2026-02-12T21:33:56Z

Problem

Provider rate limit errors (HTTP 429) and transient failures cause sessions to crash immediately, even though these errors are inherently temporary and resolve on retry.

Solution

Added a _call_provider_with_retry() method to the streaming orchestrator that wraps all 3 provider call sites with configurable exponential backoff:

Retries only retryable errors: Checks LLMError.retryable flag (True for RateLimitError, ProviderUnavailableError, LLMTimeoutError)
Exponential backoff: base_delay * 2^attempt, capped at max_delay
Honors retry_after: Uses server-provided delay when available (e.g., from 429 responses)
Observable: Emits provider:retry events on each retry attempt
Configurable: Three config knobs with sensible defaults

Configuration

Config Key	Default	Description
`retry_max_attempts`	3	Maximum retry attempts (0 to disable)
`retry_base_delay_seconds`	1.0	Base delay for exponential backoff
`retry_max_delay_seconds`	30.0	Maximum delay cap

Call sites modified

Non-streaming fallback (provider.complete()) - main execution loop
Max-iterations fallback (provider.complete()) - graceful degradation path
Streaming (provider.stream()) - primary streaming path

Test coverage

18 new tests covering all retry behaviors. All 53 tests pass (18 new + 35 existing, zero regressions).

🤖 Generated with Amplifier

…imiting test markers (microsoft#8) * fix: read HookResult modify action from tool:post events (2 sites) Both _execute_tool_only and _execute_tool_with_result now detect when a hook modifies tool output via action='modify' on tool:post, and use the modified data instead of the original get_serialized_output(). This enables truncation and transformation hooks to work correctly. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com> * fix: add missing @pytest.mark.asyncio to TestRateLimitDelay class The 4 async test methods in TestRateLimitDelay were missing the asyncio marker. With asyncio_mode = "strict" in pyproject.toml, explicit markers are required. Added class-level @pytest.mark.asyncio decorator. --------- Co-authored-by: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

Wrap all 3 provider call sites in the streaming orchestrator with configurable exponential backoff retry logic. Retries only on LLMError with retryable=True (RateLimitError, ProviderUnavailableError, LLMTimeoutError). Honors retry_after from provider responses and emits provider:retry events for observability. Config: retry_max_attempts (default 3), retry_base_delay_seconds (1.0), retry_max_delay_seconds (30.0). 18 new tests covering all retry behaviors, zero regressions. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

bkrabach and others added 2 commits February 10, 2026 12:14

bkrabach force-pushed the main branch from 9e4ce6b to 6b2747e Compare April 3, 2026 04:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add exponential backoff retry for transient provider errors#9

feat: add exponential backoff retry for transient provider errors#9
ramparte wants to merge 2 commits intomicrosoft:mainfrom
ramparte:feat/retry-on-rate-limit

ramparte commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ramparte commented Feb 12, 2026

Problem

Solution

Configuration

Call sites modified

Test coverage

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants