Is your feature request related to a problem? Please describe.
We are using Azure Fleet to manage updates across multiple AKS clusters (currently more than 17 clusters).
For cost optimization, some clusters are occasionally placed in a stopped state, but they still remain part of the Fleet configuration.
When running a multi-cluster update strategy, the entire update run fails if even one cluster is in a stopped state. Because of this, the update process stops and none of the other active clusters get updated.
This makes the update process fragile and requires manual intervention before running updates.
Describe the solution you'd like
During a multi-cluster update run, if Fleet detects that a cluster is in a stopped state, it should skip that cluster and continue the update for the remaining active clusters.
This would ensure:
Updates proceed for running clusters
Stopped clusters do not block the entire update
The process becomes more resilient and automated
Describe alternatives you've considered
Describe alternatives you've considered
Currently, the workaround is to:
Manually remove stopped clusters from the update strategy, or
Ensure all clusters are running before starting the update.
However, this is not practical in environments with many clusters and frequent update runs.
Additional context
Example scenario:
Fleet contains 17 clusters
2 clusters are stopped
15 clusters are running
Current behavior:
The update run fails because of the stopped clusters
Expected behavior:
Fleet should skip the stopped clusters and continue updating the remaining 15 running clusters
Is your feature request related to a problem? Please describe.
We are using Azure Fleet to manage updates across multiple AKS clusters (currently more than 17 clusters).
For cost optimization, some clusters are occasionally placed in a stopped state, but they still remain part of the Fleet configuration.
When running a multi-cluster update strategy, the entire update run fails if even one cluster is in a stopped state. Because of this, the update process stops and none of the other active clusters get updated.
This makes the update process fragile and requires manual intervention before running updates.
Describe the solution you'd like
During a multi-cluster update run, if Fleet detects that a cluster is in a stopped state, it should skip that cluster and continue the update for the remaining active clusters.
This would ensure:
Updates proceed for running clusters
Stopped clusters do not block the entire update
The process becomes more resilient and automated
Describe alternatives you've considered
Describe alternatives you've considered
Currently, the workaround is to:
Manually remove stopped clusters from the update strategy, or
Ensure all clusters are running before starting the update.
However, this is not practical in environments with many clusters and frequent update runs.
Additional context
Example scenario:
Fleet contains 17 clusters
2 clusters are stopped
15 clusters are running
Current behavior:
The update run fails because of the stopped clusters
Expected behavior:
Fleet should skip the stopped clusters and continue updating the remaining 15 running clusters