Skip to content

Add adaptive concurrency filter configuration to DestinationRule#3661

Open
prashanthjos wants to merge 5 commits intoistio:masterfrom
prashanthjos:adaptive-concurrency
Open

Add adaptive concurrency filter configuration to DestinationRule#3661
prashanthjos wants to merge 5 commits intoistio:masterfrom
prashanthjos:adaptive-concurrency

Conversation

@prashanthjos
Copy link
Contributor

Adds support for configuring Envoy's adaptive concurrency filter through Istio's DestinationRule TrafficPolicy.

The adaptive concurrency filter dynamically adjusts the allowed number of outstanding requests to upstream hosts based on sampled latency measurements, providing automatic concurrency limiting without manual tuning.

Example usage:

apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
  name: my-service-adaptive
spec:
  host: my-service.default.svc.cluster.local
  trafficPolicy:
    adaptiveConcurrency:
      enabled: true
      gradientControllerConfig:
        sampleAggregatePercentile: 50
        concurrencyLimitParams:
          maxConcurrencyLimit: 1000
          concurrencyUpdateInterval: 100ms
        minRttCalcParams:
          interval: 60s
          requestCount: 50
          jitter: 10
          minConcurrency: 3
          buffer: 25
      concurrencyLimitExceededStatus: 503

@prashanthjos prashanthjos requested a review from a team as a code owner March 6, 2026 02:22
@istio-testing istio-testing added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Mar 6, 2026
Adds AdaptiveConcurrency message to the DestinationRule proto, enabling configuration of Envoy's adaptive concurrency filter via TrafficPolicy. The message structure mirrors Envoy's v3 AdaptiveConcurrency proto with GradientControllerConfig, ConcurrencyLimitCalculationParams, and MinimumRTTCalculationParams. Field names and numbers are aligned with the upstream Envoy proto.
@prashanthjos prashanthjos force-pushed the adaptive-concurrency branch from cc2f2a8 to aac1d4b Compare March 6, 2026 02:27
@prashanthjos
Copy link
Contributor Author

This is the original issue, and this is the proposal document.

// Adaptive concurrency settings for dynamically adjusting the allowed number of
// outstanding requests based on sampled latencies. This enables the Envoy
// adaptive concurrency filter for the destination.
AdaptiveConcurrency adaptive_concurrency = 9;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Does this completely override the static max_connections circuit breaker threshold? Should we describe that here so that its more clear what to expect if both are configured?

  • Is it also worth noting that new connections that exceed the adaptive concurrency threshold won't be reflected in circuit breaker statistics, the way they would when connections exceed the max_connections threshold? (At least that is how I understand it).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally when the circuit breaker threshold is exceeded in Envoy, you also have the x-envoy-overloaded header returned as part of the response. Some users might rely on that header, or maybe the client-side Envoy proxy relies on that in some way.

Should we note that the x-envoy-overloaded header won't be present when the adaptive concurrency threshold is exceeded?

Copy link
Contributor Author

@prashanthjos prashanthjos Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Does this completely override the static max_connections circuit breaker threshold? Should we describe that here so that its more clear what to expect if both are configured?
  • Is it also worth noting that new connections that exceed the adaptive concurrency threshold won't be reflected in circuit breaker statistics, the way they would when connections exceed the max_connections threshold? (At least that is how I understand it).

@ericdbishop these are good points, I digged a little bit more into Envoy code, here is their adaptive filter. When the adaptive concurrency filter blocks a request, it does so here , the request never reaches the router at all:

  if (controller_->forwardingDecision() == Controller::RequestForwardingAction::Block) {
    decoder_callbacks_->sendLocalReply(config_->concurrencyLimitExceededStatus(),
                                       "reached concurrency limit", nullptr, absl::nullopt,
                                       "reached_concurrency_limit");
    return Http::FilterHeadersStatus::StopIteration;

Whereas circuit breakers are checked much later, inside the connection pool when the router tries to establish an upstream connection:

  const bool can_create_connection = host_->canCreateConnection(priority_);
  if (!can_create_connection) {
    host_->cluster().trafficStats()->upstream_cx_overflow_.inc();
  }
  if (!host_->cluster().resourceManager(priority_).pendingRequests().canCreate()) {
    ENVOY_LOG(debug, "max pending streams overflow");
    onPoolFailure(nullptr, absl::string_view(), ConnectionPool::PoolFailureReason::Overflow,
                  context);
    host_->cluster().trafficStats()->upstream_rq_pending_overflow_.inc();
    return nullptr;
  }

So, No, Adaptive Concurrency Does NOT Override Circuit Breakers. If both are configured, a request must pass both checks independently:

  • First, the adaptive concurrency filter checks if num_rq_outstanding < concurrency_limit. If not, the request is rejected immediately with a 503.
  • If it passes, the request continues through the filter chain to the router, where circuit breaker thresholds (max_connections, max_pending_requests, max_requests) are checked independently at the connection pool level.

In practice, whichever limit is more restrictive will be the binding constraint. If the adaptive concurrency limit is lower than max_connections, most requests will be shed by the filter before circuit breakers ever see them.

When the adaptive concurrency filter blocks a request, it increments only its own stat:

  • adaptive_concurrency.gradient_controller.rq_blocked

Since the request never reaches the router or connection pool, none of the circuit breaker stats are affected:

  • No upstream_cx_overflow_
  • No upstream_rq_pending_overflow_
  • No upstream_rq_retry_overflow_

Someone monitoring only circuit breaker stats would not see that requests are being shed.This makes the three key facts explicit:

  • They don't override each other, they're layered
  • They limit different things at different points
  • They have separate stats, so operators need to monitor both

This is part of Envoy and little we can do with Istio, In any case I have updated the comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally when the circuit breaker threshold is exceeded in Envoy, you also have the x-envoy-overloaded header returned as part of the response. Some users might rely on that header, or maybe the client-side Envoy proxy relies on that in some way.

Should we note that the x-envoy-overloaded header won't be present when the adaptive concurrency threshold is exceeded?

This is true, the header is not added, I have updated the comment !

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The notes you added to the comment look good to me, so at least users are aware of the differences. Thanks!

@prashanthjos
Copy link
Contributor Author

@ramaraochavali @keithmattix can you please help with the review

message MinimumRTTCalculationParams {
// The time interval between recalculating the minimum request round-trip time.
// Has to be positive. If set to zero, dynamic sampling of the minRTT is disabled.
google.protobuf.Duration interval = 1;
Copy link
Contributor

@keithmattix keithmattix Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interval has no default, users must either set interval (to enable dynamic minRTT sampling) or set fixed_value (to use a static minRTT). If you set neither, config validation throws an exception.

// The fixed value for the minRTT. This value is used when minRTT is not sampled
// dynamically. If dynamic sampling of the minRTT is disabled (i.e. `interval` is
// set to zero), this field must be set.
google.protobuf.Duration fixed_value = 6;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So at least one of fixed_value and or interval must be set? Should it be a oneof?

Copy link
Contributor Author

@prashanthjos prashanthjos Mar 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its a good suggestion but these fields are not a oneof in the upstream Envoy v3 proto also. I've kept the structure aligned with Envoy's proto to maintain a clear mapping for the control plane translation layer and updated the field comments instead, if interval is set, fixed_value is ignored.

Happy to switch to a oneof if we prefer a stricter Istio-native API over Envoy alignment. Let me know your preference.

// calculation window. This is represented as a percentage of the interval duration.
// Defaults to 15%. This is recommended to prevent all hosts in a cluster from
// being in a minRTT calculation window at the same time.
google.protobuf.DoubleValue jitter = 3;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this really need to be a double if, as in the example, folks are setting whole number percentages? Same goes for buffer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm fair point, changed jitter, buffer, and sample_aggregate_percentile from DoubleValue to UInt32Value. These are whole-number percentages (0–100) in practice, and UInt32Value still preserves nullability so unset values correctly fall back to Envoy's defaults (15%, 25%, p50 respectively).

Note: Envoy's proto uses type.v3.Percent which wraps a double, but since Istio doesn't import Envoy types and fractional percentages aren't practically useful here, UInt32Value is a better fit and consistent with other fields in this message (request_count, min_concurrency). Let me know what you think.

MinimumRTTCalculationParams min_rtt_calc_params = 3 [(google.api.field_behavior) = REQUIRED];
}

oneof concurrency_controller_config {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a oneof?

Copy link
Contributor Author

@prashanthjos prashanthjos Mar 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a oneof in the Envoy v3 proto to allow for future concurrency controller types beyond the gradient controller, but from what I can understand, Envoy has only ever had the gradient controller (since the filter was introduced), so in practice there's only one option. We can still drop OneOf but if we have to align with Envoy Proto we can let it stay.

let me know your thoughts I am open to change to plain field.

}

// If set to false, the adaptive concurrency filter will operate as a pass-through
// filter. If unspecified, the filter will be enabled.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to change every single Envoy deployment to use adaptive concurrency right? That is very likely to throttle people's production traffic

Copy link
Contributor

@sschepens sschepens Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have another question, does this field even make sense? If I wanted to disable it, wouldn't I not specify adaptiveConcurrency directly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. I have asked the same question in the design doc also

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, agreed. Removed the enabled field. I was dedicatedly following the Envoy's spec. Diving a little deep into Envoy I understood in Envoy's filter chain model, enabled (as a RuntimeFeatureFlag) allows toggling the filter at runtime without removing the config. Since Istio doesn't expose Envoy's runtime feature flags, the field add's no value here.

Apologies @ramaraochavali I missed your comment in the design doc.

// from sampled request latencies.
message ConcurrencyLimitCalculationParams {
// The allowed upper-bound on the calculated concurrency limit. Defaults to 1000.
google.protobuf.UInt32Value max_concurrency_limit = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would service owners arrive at what is the correct value for this?

Copy link
Contributor Author

@prashanthjos prashanthjos Mar 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the comments to the best of my knowledge :) , unfortunately this isn't documented well enough in Envoy also. I would let any experts correct me here and I would be happy to change the comments.

google.protobuf.DoubleValue jitter = 3;

// The concurrency limit set while measuring the minRTT. Defaults to 3.
google.protobuf.UInt32Value min_concurrency = 4;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this indicate and how would service owner know what to set?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the lower bound for concurrency (i.e. ensure the adaptive concurrency filter doesn't throttle the concurrency below this threshold). I think most of these descriptions are the same as the Envoy API reference.

The main docs page explains the calculation with more detail, but it is a lot to interpret for the user. Even the defaults for adaptive concurrency used at Lyft were pretty different than the Envoy defaults, and some settings might have to tuned at the per-service level.

With all of the tuning available, this is overall an advanced feature, maybe this PR should be accompanied by an istio.io docs user guide?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we can do that, while merging the implementation PR, I will try my best to do justice to the documentation :) .

Copy link
Contributor

@ramaraochavali ramaraochavali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to give guidance on how would service owner sets some of the values what are the implications of setting a specific value with proper examples. This is very hard to configure and get it right unless we provide proper documentation and guidance.

@prashanthjos
Copy link
Contributor Author

Sure @ramaraochavali we can try to update the documentation, by looking at the code and available Envoy documentation, I will do that as part of implementation PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants