Skip to content

InvalidOperationException: "The local RPC address has not been configured" #1306

@andrey-malkov

Description

@andrey-malkov

Bug: BindingHelper throws InvalidOperationException instead of a retryable exception when local RPC address is temporarily unavailable

Context

Over the past week, our Azure Functions Isolated Worker apps have been experiencing frequent Grpc.Core.RpcException failures (StatusCode="Internal", HTTP_1_1_REQUIRED) on DurableTaskClient calls such as GetInstanceAsync, ScheduleNewOrchestrationInstanceAsync, and RaiseEventAsync. These are transient gRPC sidecar issues, so we implemented an application-level retry strategy using Polly to wrap all DurableTaskClient calls.

While investigating, we discovered a second failure mode — System.InvalidOperationException: "The local RPC address has not been configured!" — which appears to be caused by the same underlying network/sidecar instability. However, this exception cannot be retried at the application level because it occurs in the host binding pipeline before our function code executes.

Summary

When the Durable Task gRPC sidecar's local RPC address is temporarily unavailable, BindingHelper.DurableOrchestrationClientToString throws InvalidOperationException. This is a non-retryable exception for what is a transient infrastructure condition. We expect the library to handle this transient issue in a more graceful and reliable way rather than immediately failing with an unrecoverable exception.

The issue is transient in nature but can persist for extended periods (minutes to hours), during which every function invocation using [DurableClient] binding fails immediately. For queue-triggered functions, this causes messages to rapidly exhaust their dequeue count and move to poison queues, resulting in data loss.

Environment

Component Version
Azure Functions Host 4.1047.100.26071
Microsoft.Azure.WebJobs.Extensions.DurableTask 3.0.0.0
Microsoft.Azure.Functions.Worker 2.51.0
Microsoft.Azure.Functions.Worker.Sdk 2.0.0
Microsoft.Azure.Functions.Worker.Extensions.DurableTask 1.14.1
.NET 8.0
OS Windows (Azure App Service)
Plan Premium v3

Reproduction

This issue occurs in production under load. We have not identified a reliable minimal reproduction, but the pattern is consistent:

  1. Function app is running normally processing queue messages
  2. At some point, the Durable Task sidecar's local RPC address becomes temporarily unavailable
  3. All subsequent function invocations that use [DurableClient] binding fail immediately with InvalidOperationException
  4. The condition is transient but can last from minutes to several hours
  5. During this period, no function using [DurableClient] can execute

Exception Details

Inner exception (root cause):

System.InvalidOperationException: The local RPC address has not been configured!
   at Microsoft.Azure.WebJobs.Extensions.DurableTask.BindingHelper.DurableOrchestrationClientToString(DurableClientAttribute, DurableOrchestrationClientToString) in BindingHelper.cs:line 37

Outer exception:

Microsoft.Azure.WebJobs.Host.FunctionInvocationException: Exception while executing function: Functions.InputQueueTrigger

Full call chain:

WorkerFunctionInvoker.InvokeCore
  → WorkerFunctionInvoker.BindInputsAsync
    → ExtensionBinding.BindAsync
      → FunctionBinding.BindStringAsync
        → Binder.BindAsync
          → BindToInputBindingProvider.BuildAsync
            → PatternMatcher.New
              → BindingHelper.DurableOrchestrationClientToString  ← FAILS HERE

The failure occurs during the host-side input binding phase, before the isolated worker process receives the invocation. The host attempts to serialize the [DurableClient] binding info (including the local gRPC sidecar RPC address) to pass to the worker, but the address has not been configured.

The Problem

The core issue is that BindingHelper.DurableOrchestrationClientToString treats a transient sidecar availability problem as a fatal error by throwing InvalidOperationException:

  1. Wrong exception type: InvalidOperationException is not recognized as transient by the Azure Functions host. The host treats it as a non-retryable application error, which is incorrect for an infrastructure availability issue.

  2. No wait/retry: The method fails immediately (~5-7ms) without attempting to wait for the sidecar to become ready or retrying the RPC address lookup.

  3. Affects all functions: During the issue window, every function using [DurableClient] binding fails — timer triggers, queue triggers, and activity functions are all impacted.

  4. Not interceptable by application code: The failure occurs in the host binding pipeline before user code executes, so no application-level retry (e.g., Polly) can work around it.

  5. Poison queue data loss: For queue-triggered functions, the rapid failures cause messages to exhaust their dequeue count (default: 5) and move to poison queues. This results in lost work items that require manual reprocessing.

Expected Behavior

  1. BindingHelper.DurableOrchestrationClientToString should wait for the sidecar to become ready (with a reasonable timeout) rather than failing immediately when the RPC address is not yet available.

  2. If waiting is not feasible, the method should throw a retryable exception type (e.g., a custom transient exception or RpcException with StatusCode.Unavailable) so the host and built-in retry mechanisms can handle it appropriately.

  3. The current behavior of throwing InvalidOperationException is incorrect because it signals a programming error rather than a transient infrastructure condition, preventing any retry-based recovery.

Related

This issue may be related to gRPC sidecar instability we've also reported — Grpc.Core.RpcException with StatusCode="Internal" and HTTP_1_1_REQUIRED / socket exhaustion errors occurring on DurableTaskClient calls. Both issues point to instability in the Durable Task gRPC sidecar lifecycle management.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions