Skip to content

feat: add exponential backoff retry for transient provider errors#9

Open
ramparte wants to merge 2 commits intomicrosoft:mainfrom
ramparte:feat/retry-on-rate-limit
Open

feat: add exponential backoff retry for transient provider errors#9
ramparte wants to merge 2 commits intomicrosoft:mainfrom
ramparte:feat/retry-on-rate-limit

Conversation

@ramparte
Copy link
Copy Markdown

Problem

Provider rate limit errors (HTTP 429) and transient failures cause sessions to crash immediately, even though these errors are inherently temporary and resolve on retry.

Solution

Added a _call_provider_with_retry() method to the streaming orchestrator that wraps all 3 provider call sites with configurable exponential backoff:

  • Retries only retryable errors: Checks LLMError.retryable flag (True for RateLimitError, ProviderUnavailableError, LLMTimeoutError)
  • Exponential backoff: base_delay * 2^attempt, capped at max_delay
  • Honors retry_after: Uses server-provided delay when available (e.g., from 429 responses)
  • Observable: Emits provider:retry events on each retry attempt
  • Configurable: Three config knobs with sensible defaults

Configuration

Config Key Default Description
retry_max_attempts 3 Maximum retry attempts (0 to disable)
retry_base_delay_seconds 1.0 Base delay for exponential backoff
retry_max_delay_seconds 30.0 Maximum delay cap

Call sites modified

  1. Non-streaming fallback (provider.complete()) - main execution loop
  2. Max-iterations fallback (provider.complete()) - graceful degradation path
  3. Streaming (provider.stream()) - primary streaming path

Test coverage

18 new tests covering all retry behaviors. All 53 tests pass (18 new + 35 existing, zero regressions).

🤖 Generated with Amplifier

bkrabach and others added 2 commits February 10, 2026 12:14
…imiting test markers (microsoft#8)

* fix: read HookResult modify action from tool:post events (2 sites)

Both _execute_tool_only and _execute_tool_with_result now detect when a
hook modifies tool output via action='modify' on tool:post, and use the
modified data instead of the original get_serialized_output(). This
enables truncation and transformation hooks to work correctly.

🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)

Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

* fix: add missing @pytest.mark.asyncio to TestRateLimitDelay class

The 4 async test methods in TestRateLimitDelay were missing the asyncio
marker. With asyncio_mode = "strict" in pyproject.toml, explicit markers
are required. Added class-level @pytest.mark.asyncio decorator.

---------

Co-authored-by: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
Wrap all 3 provider call sites in the streaming orchestrator with
configurable exponential backoff retry logic. Retries only on
LLMError with retryable=True (RateLimitError, ProviderUnavailableError,
LLMTimeoutError). Honors retry_after from provider responses and emits
provider:retry events for observability.

Config: retry_max_attempts (default 3), retry_base_delay_seconds (1.0),
retry_max_delay_seconds (30.0).

18 new tests covering all retry behaviors, zero regressions.

🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)

Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants