feat(zedkube): configurable VMI descheduler for failback#5885
Open
andrewd-zededa wants to merge 1 commit intolf-edge:masterfrom
Open
feat(zedkube): configurable VMI descheduler for failback#5885andrewd-zededa wants to merge 1 commit intolf-edge:masterfrom
andrewd-zededa wants to merge 1 commit intolf-edge:masterfrom
Conversation
efc21bf to
40ee3b1
Compare
1339d30 to
d5b2810
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #5885 +/- ##
==========================================
- Coverage 19.52% 17.11% -2.42%
==========================================
Files 19 474 +455
Lines 3021 85697 +82676
==========================================
+ Hits 590 14663 +14073
- Misses 2310 69516 +67206
- Partials 121 1518 +1397 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
zedi-pramodh
reviewed
May 1, 2026
d5b2810 to
0be7e40
Compare
0f3fe0b to
e154710
Compare
Contributor
Author
|
rebased on latest master, testing locally for now |
e154710 to
c537c7b
Compare
c537c7b to
a275303
Compare
Replaces the shell-based descheduler trigger with a Go implementation
that fires the Kubernetes descheduler Job once per boot when the new
"kubernetes.vmi.deschedule.events" config key contains "boot".
kubeapi/descheduler.go (new):
- IsDeschedulerReady: returns (false, nil) until the local node is
Ready and schedulable, all Longhorn daemonsets are ready, and
(when present) the kubevirt CR reports Available.
- TriggerDescheduler: Create-first idempotent job management — skips
if an active Job already exists (handles multi-node boot race),
otherwise deletes any stale completed/failed Job and recreates.
Calls ensureDeschedulerSetup to Get-or-Create-or-Update the
descheduler ServiceAccount, ClusterRole, ClusterRoleBinding, and
policy ConfigMap before each run.
- EnsureVMsDeschedulerAnnotated: idempotently stamps
descheduler.alpha.kubernetes.io/evict=true on every VMIRS template
and live VMI in eve-kube-app namespace. No-op in base-k3s mode.
- Stub added to nokube.go for non-kube builds.
zedkube/descheduler.go (new):
- deschedulerOnBootWatcher goroutine: reads VmiDescheduleEvents.OnBoot
at startup and exits immediately if disabled; otherwise polls
IsDeschedulerReady every 15s then calls TriggerDescheduler once.
zedkube/zedkube.go:
- deschedulerOnBootStarted bool guards against re-launching the watcher
when EdgeNodeInfo events arrive more than once during init.
- handleVmiDescheduleEventsOverride: converts KubernetesVmiDescheduleEvents
string to VmiDescheduleConfig{OnBoot} and re-publishes KubeConfig on
change; compares against the published bool so the first call always
reconciles the zero-value initial publish.
- Initial pubKubeConfig publish includes K3sVersion from defaults;
VmiDescheduleEvents is reconciled by the first handleGlobalConfigImpl
call before the watcher goroutine can launch.
hypervisor/kubevirt.go:
- CreateReplicaVMIConfig stamps DeschedulerEvictAnnotation on the VMIRS
pod template so new app VMs are evictable without a separate pass.
domainmgr/domainmgr.go:
- Calls EnsureVMsDeschedulerAnnotated at kubevirt-mode startup to
retroactively annotate VMIRSes and VMIs that pre-date this change.
types/global.go + types/zedkubetypes.go:
- KubernetesVmiDescheduleEvents config key ("kubernetes.vmi.deschedule.events",
default "") and VmiDescheduleEventBoot = "boot" constant.
- VmiDescheduleConfig{OnBoot bool} struct; KubeConfig.VmiDescheduleEvents field.
- Documented in docs/CONFIG-PROPERTIES.md.
docs/failover.md:
- Updates the "Descheduler trigger" section under "Failback handling" to
describe the Go-based deschedulerOnBootWatcher goroutine, the opt-in
config key, IsDeschedulerReady prerequisites, and TriggerDescheduler's
Create-first idempotent job management replacing the removed shell
function Update_RunDeschedulerOnBoot.
pkg/kube/:
- Removes descheduler-job.yaml and Update_RunDeschedulerOnBoot from
cluster-update.sh and its two call sites in cluster-init.sh.
- Removes descheduler-job.yaml COPY from Dockerfile.
- shellcheck source annotations and integer comparison quoting cleanup
in cluster-init.sh; indentation fix in descheduler-utils.sh.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Andrew Durbin <andrewd@zededa.com>
a275303 to
196c662
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Replaces the shell-based descheduler trigger with a Go implementation
that fires the Kubernetes descheduler Job once per boot when the new
"kubernetes.vmi.deschedule.events" config key contains "boot".
kubeapi/descheduler.go (new):
Ready and schedulable, all Longhorn daemonsets are ready, and
(when present) the kubevirt CR reports Available.
if an active Job already exists (handles multi-node boot race),
otherwise deletes any stale completed/failed Job and recreates.
Calls ensureDeschedulerSetup to Get-or-Create-or-Update the
descheduler ServiceAccount, ClusterRole, ClusterRoleBinding, and
policy ConfigMap before each run.
descheduler.alpha.kubernetes.io/evict=true on every VMIRS template
and live VMI in eve-kube-app namespace. No-op in base-k3s mode.
zedkube/descheduler.go (new):
at startup and exits immediately if disabled; otherwise polls
IsDeschedulerReady every 15s then calls TriggerDescheduler once.
zedkube/zedkube.go:
when EdgeNodeInfo events arrive more than once during init.
string to VmiDescheduleConfig{OnBoot} and re-publishes KubeConfig on
change; compares against the published bool so the first call always
reconciles the zero-value initial publish.
VmiDescheduleEvents is reconciled by the first handleGlobalConfigImpl
call before the watcher goroutine can launch.
hypervisor/kubevirt.go:
pod template so new app VMs are evictable without a separate pass.
domainmgr/domainmgr.go:
retroactively annotate VMIRSes and VMIs that pre-date this change.
types/global.go + types/zedkubetypes.go:
default "") and VmiDescheduleEventBoot = "boot" constant.
docs/failover.md:
describe the Go-based deschedulerOnBootWatcher goroutine, the opt-in
config key, IsDeschedulerReady prerequisites, and TriggerDescheduler's
Create-first idempotent job management replacing the removed shell
function Update_RunDeschedulerOnBoot.
pkg/kube/:
cluster-update.sh and its two call sites in cluster-init.sh.
in cluster-init.sh; indentation fix in descheduler-utils.sh.
PR dependencies
None
How to test and validate this PR
Changelog notes
Configuration property to enable per edge-node app failback triggers.
PR Backports
Checklist
And the last but not least:
check them.
Please, check the boxes above after submitting the PR in interactive mode.