fix: improve NodePool weight fallback and add pre-launch circuit breaker for quota/capacity errors#1433
Open
GuetaHen wants to merge 2 commits intoAzure:mainfrom
Open
Conversation
added 2 commits
February 17, 2026 12:29
…ota errors This change addresses the issue where Karpenter fails to fall back to lower-weight NodePools when higher-weight pools are quota-exhausted. Three specific gaps are fixed: 1. SKU family quota (non-zero limit): Previously used the default 3-minute TTL, causing offerings to recycle back into the available pool before the scheduler could exhaust all SKUs in the high-weight NodePool and fall through to lower-weight pools. Now uses 15-minute TTL (SKUFamilyQuotaNonZeroTTL). 2. Regional quota exhausted: Previously returned InsufficientCapacityError without updating the ICE cache, so subsequent scheduling loops would keep selecting instance types from the exhausted capacity type. Now marks offerings as unavailable in the cache with a 30-minute TTL before returning the error. 3. Added new TTL constants: SKUFamilyQuotaNonZeroTTL (15m) and RegionalQuotaExhaustedTTL (30m) to differentiate between transient and persistent quota exhaustion. Tests added: - Regional quota exceeded for on-demand marks all zones unavailable - Regional quota exceeded for spot marks all spot unavailable - SKU family quota non-zero limit uses longer TTL to prevent recycling Signed-off-by: Hicham Engoueta <hengoueta@microsoft.com>
… calls Adds a PreLaunchFilter that re-checks instance types against the live ICE (Insufficient Capacity Error) cache right before making Azure API calls. This acts as a circuit breaker during large-scale scheduling (e.g., 5000 cores): when the scheduler creates many NodeClaims simultaneously, the first VM creation failure updates the ICE cache, and subsequent NodeClaims skip the failed SKU immediately instead of making redundant API calls that are guaranteed to fail. Without this: ~78 wasted API calls, all fail with AllocationFailed/quota errors. With this: first ~5-10 calls may fail (race window), then remaining skip instantly. Integration: - New function: offerings.PreLaunchFilter() with fail-open design - Called in both DefaultVMProvider.BeginCreate() and DefaultAKSMachineProvider.BeginCreate() - Uses the existing ICE cache (unavailableOfferings) from the error handler - No new Azure API calls — purely in-memory cache re-check Signed-off-by: Hicham Engoueta <hengoueta@microsoft.com>
Author
|
@microsoft-github-policy-service agree company="Microsoft" |
Author
|
This PR fixes #1323 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #1323
Summary
This PR addresses a critical issue where Karpenter fails to fall back to lower-weight NodePools when higher-weight pools encounter quota exhaustion or capacity outages during large-scale scheduling.
Problem
When a high-weight NodePool's SKU family hits Azure quota limits or capacity outages:
Changes
Commit 1: Fix ICE cache gaps for quota errors (
commonerrorhandlers.go)InsufficientCapacityErrorSKUFamilyQuotaNonZeroTTLandRegionalQuotaExhaustedTTLconstantsCommit 2: Pre-launch circuit breaker (
offerings.go,vminstance.go,aksmachineinstance.go)PreLaunchFilter()re-checks instance types against the live ICE cache at launch timeNewLiveCacheAvailabilityCheck()helper avoids code duplication between VM and AKS Machine providersImpact
Tests
All existing tests pass. No new Azure API dependencies.