Skip to content

feat(zedkube): configurable VMI descheduler for failback#5885

Open
andrewd-zededa wants to merge 1 commit intolf-edge:masterfrom
andrewd-zededa:eve-k-deschedule-vmis
Open

feat(zedkube): configurable VMI descheduler for failback#5885
andrewd-zededa wants to merge 1 commit intolf-edge:masterfrom
andrewd-zededa:eve-k-deschedule-vmis

Conversation

@andrewd-zededa
Copy link
Copy Markdown
Contributor

@andrewd-zededa andrewd-zededa commented Apr 30, 2026

Description

Replaces the shell-based descheduler trigger with a Go implementation
that fires the Kubernetes descheduler Job once per boot when the new
"kubernetes.vmi.deschedule.events" config key contains "boot".

kubeapi/descheduler.go (new):

  • IsDeschedulerReady: returns (false, nil) until the local node is
    Ready and schedulable, all Longhorn daemonsets are ready, and
    (when present) the kubevirt CR reports Available.
  • TriggerDescheduler: Create-first idempotent job management — skips
    if an active Job already exists (handles multi-node boot race),
    otherwise deletes any stale completed/failed Job and recreates.
    Calls ensureDeschedulerSetup to Get-or-Create-or-Update the
    descheduler ServiceAccount, ClusterRole, ClusterRoleBinding, and
    policy ConfigMap before each run.
  • EnsureVMsDeschedulerAnnotated: idempotently stamps
    descheduler.alpha.kubernetes.io/evict=true on every VMIRS template
    and live VMI in eve-kube-app namespace. No-op in base-k3s mode.
  • Stub added to nokube.go for non-kube builds.

zedkube/descheduler.go (new):

  • deschedulerOnBootWatcher goroutine: reads VmiDescheduleEvents.OnBoot
    at startup and exits immediately if disabled; otherwise polls
    IsDeschedulerReady every 15s then calls TriggerDescheduler once.

zedkube/zedkube.go:

  • deschedulerOnBootStarted bool guards against re-launching the watcher
    when EdgeNodeInfo events arrive more than once during init.
  • handleVmiDescheduleEventsOverride: converts KubernetesVmiDescheduleEvents
    string to VmiDescheduleConfig{OnBoot} and re-publishes KubeConfig on
    change; compares against the published bool so the first call always
    reconciles the zero-value initial publish.
  • Initial pubKubeConfig publish includes K3sVersion from defaults;
    VmiDescheduleEvents is reconciled by the first handleGlobalConfigImpl
    call before the watcher goroutine can launch.

hypervisor/kubevirt.go:

  • CreateReplicaVMIConfig stamps DeschedulerEvictAnnotation on the VMIRS
    pod template so new app VMs are evictable without a separate pass.

domainmgr/domainmgr.go:

  • Calls EnsureVMsDeschedulerAnnotated at kubevirt-mode startup to
    retroactively annotate VMIRSes and VMIs that pre-date this change.

types/global.go + types/zedkubetypes.go:

  • KubernetesVmiDescheduleEvents config key ("kubernetes.vmi.deschedule.events",
    default "") and VmiDescheduleEventBoot = "boot" constant.
  • VmiDescheduleConfig{OnBoot bool} struct; KubeConfig.VmiDescheduleEvents field.
  • Documented in docs/CONFIG-PROPERTIES.md.

docs/failover.md:

  • Updates the "Descheduler trigger" section under "Failback handling" to
    describe the Go-based deschedulerOnBootWatcher goroutine, the opt-in
    config key, IsDeschedulerReady prerequisites, and TriggerDescheduler's
    Create-first idempotent job management replacing the removed shell
    function Update_RunDeschedulerOnBoot.

pkg/kube/:

  • Removes descheduler-job.yaml and Update_RunDeschedulerOnBoot from
    cluster-update.sh and its two call sites in cluster-init.sh.
  • Removes descheduler-job.yaml COPY from Dockerfile.
  • shellcheck source annotations and integer comparison quoting cleanup
    in cluster-init.sh; indentation fix in descheduler-utils.sh.

PR dependencies

None

How to test and validate this PR

  • deploy three HV=k eve nodes
  • configure EdgeNodeClusterConfig to create a three node cluster
  • deploy one app instance to the cluster without strict node affinity configured
  • set the config property to enable failback 'kubernetes.vmi.deschedule.events:boot'
  • initiate a power failure to the node hosting the app instance
  • wait for the app instance to come ready/running on another node
  • restore power to the node
  • wait for the node to boot and after all pods are ready, the rescheduler should evict the app and allow it to reschedule to the original node

Changelog notes

Configuration property to enable per edge-node app failback triggers.

PR Backports

  • 16.0-stable: If requested
  • 14.5-stable: No, as the feature is not available there.
  • 13.4-stable: No, as the feature is not available there.

Checklist

  • I've provided a proper description
  • I've added the proper documentation
  • I've tested my PR on amd64 device
  • I've tested my PR on arm64 device
  • I've written the test verification instructions
  • I've set the proper labels to this PR

And the last but not least:

  • I've checked the boxes above, or I've provided a good reason why I didn't
    check them.

Please, check the boxes above after submitting the PR in interactive mode.

@andrewd-zededa andrewd-zededa force-pushed the eve-k-deschedule-vmis branch 2 times, most recently from efc21bf to 40ee3b1 Compare April 30, 2026 18:37
@andrewd-zededa andrewd-zededa changed the title feat(zedkube): on-boot VMI descheduler with typed deschedule-event co… feat(zedkube): lay foundation for event-driven VMI descheduler failback Apr 30, 2026
@andrewd-zededa andrewd-zededa force-pushed the eve-k-deschedule-vmis branch 2 times, most recently from 1339d30 to d5b2810 Compare April 30, 2026 21:53
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 30, 2026

Codecov Report

❌ Patch coverage is 20.00000% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 17.11%. Comparing base (2281599) to head (a275303).
⚠️ Report is 641 commits behind head on master.

Files with missing lines Patch % Lines
pkg/pillar/cmd/domainmgr/domainmgr.go 0.00% 2 Missing ⚠️
pkg/pillar/kubeapi/nokube.go 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5885      +/-   ##
==========================================
- Coverage   19.52%   17.11%   -2.42%     
==========================================
  Files          19      474     +455     
  Lines        3021    85697   +82676     
==========================================
+ Hits          590    14663   +14073     
- Misses       2310    69516   +67206     
- Partials      121     1518    +1397     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread docs/CONFIG-PROPERTIES.md
Comment thread pkg/pillar/hypervisor/kubevirt.go Outdated
@andrewd-zededa andrewd-zededa force-pushed the eve-k-deschedule-vmis branch from d5b2810 to 0be7e40 Compare May 1, 2026 22:59
@andrewd-zededa andrewd-zededa force-pushed the eve-k-deschedule-vmis branch 2 times, most recently from 0f3fe0b to e154710 Compare May 4, 2026 20:29
@andrewd-zededa
Copy link
Copy Markdown
Contributor Author

rebased on latest master, testing locally for now

@andrewd-zededa andrewd-zededa changed the title feat(zedkube): lay foundation for event-driven VMI descheduler failback feat(zedkube): configurable event-driven VMI descheduler failback May 4, 2026
@andrewd-zededa andrewd-zededa changed the title feat(zedkube): configurable event-driven VMI descheduler failback feat(zedkube): configurable VMI descheduler failback May 4, 2026
@andrewd-zededa andrewd-zededa force-pushed the eve-k-deschedule-vmis branch from e154710 to c537c7b Compare May 4, 2026 23:11
@andrewd-zededa andrewd-zededa changed the title feat(zedkube): configurable VMI descheduler failback feat(zedkube): configurable VMI descheduler for failback May 4, 2026
@andrewd-zededa andrewd-zededa force-pushed the eve-k-deschedule-vmis branch from c537c7b to a275303 Compare May 4, 2026 23:36
Replaces the shell-based descheduler trigger with a Go implementation
that fires the Kubernetes descheduler Job once per boot when the new
"kubernetes.vmi.deschedule.events" config key contains "boot".

kubeapi/descheduler.go (new):
  - IsDeschedulerReady: returns (false, nil) until the local node is
    Ready and schedulable, all Longhorn daemonsets are ready, and
    (when present) the kubevirt CR reports Available.
  - TriggerDescheduler: Create-first idempotent job management — skips
    if an active Job already exists (handles multi-node boot race),
    otherwise deletes any stale completed/failed Job and recreates.
    Calls ensureDeschedulerSetup to Get-or-Create-or-Update the
    descheduler ServiceAccount, ClusterRole, ClusterRoleBinding, and
    policy ConfigMap before each run.
  - EnsureVMsDeschedulerAnnotated: idempotently stamps
    descheduler.alpha.kubernetes.io/evict=true on every VMIRS template
    and live VMI in eve-kube-app namespace. No-op in base-k3s mode.
  - Stub added to nokube.go for non-kube builds.

zedkube/descheduler.go (new):
  - deschedulerOnBootWatcher goroutine: reads VmiDescheduleEvents.OnBoot
    at startup and exits immediately if disabled; otherwise polls
    IsDeschedulerReady every 15s then calls TriggerDescheduler once.

zedkube/zedkube.go:
  - deschedulerOnBootStarted bool guards against re-launching the watcher
    when EdgeNodeInfo events arrive more than once during init.
  - handleVmiDescheduleEventsOverride: converts KubernetesVmiDescheduleEvents
    string to VmiDescheduleConfig{OnBoot} and re-publishes KubeConfig on
    change; compares against the published bool so the first call always
    reconciles the zero-value initial publish.
  - Initial pubKubeConfig publish includes K3sVersion from defaults;
    VmiDescheduleEvents is reconciled by the first handleGlobalConfigImpl
    call before the watcher goroutine can launch.

hypervisor/kubevirt.go:
  - CreateReplicaVMIConfig stamps DeschedulerEvictAnnotation on the VMIRS
    pod template so new app VMs are evictable without a separate pass.

domainmgr/domainmgr.go:
  - Calls EnsureVMsDeschedulerAnnotated at kubevirt-mode startup to
    retroactively annotate VMIRSes and VMIs that pre-date this change.

types/global.go + types/zedkubetypes.go:
  - KubernetesVmiDescheduleEvents config key ("kubernetes.vmi.deschedule.events",
    default "") and VmiDescheduleEventBoot = "boot" constant.
  - VmiDescheduleConfig{OnBoot bool} struct; KubeConfig.VmiDescheduleEvents field.
  - Documented in docs/CONFIG-PROPERTIES.md.

docs/failover.md:
  - Updates the "Descheduler trigger" section under "Failback handling" to
    describe the Go-based deschedulerOnBootWatcher goroutine, the opt-in
    config key, IsDeschedulerReady prerequisites, and TriggerDescheduler's
    Create-first idempotent job management replacing the removed shell
    function Update_RunDeschedulerOnBoot.

pkg/kube/:
  - Removes descheduler-job.yaml and Update_RunDeschedulerOnBoot from
    cluster-update.sh and its two call sites in cluster-init.sh.
  - Removes descheduler-job.yaml COPY from Dockerfile.
  - shellcheck source annotations and integer comparison quoting cleanup
    in cluster-init.sh; indentation fix in descheduler-utils.sh.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Andrew Durbin <andrewd@zededa.com>
@andrewd-zededa andrewd-zededa force-pushed the eve-k-deschedule-vmis branch from a275303 to 196c662 Compare May 5, 2026 18:02
@andrewd-zededa andrewd-zededa marked this pull request as ready for review May 5, 2026 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants