Skip to content

bug: bootstrap-only taint enforcement silently skipped for nodes that join during the initial rule cache warm-up window #253

@Karman580

Description

@Karman580

Summary

RuleReadinessController maintains an in-memory ruleCache that is populated
lazily as RuleReconciler processes each NodeReadinessRule. NodeReconciler
derives all applicable rules exclusively from this cache via
getApplicableRulesForNode. Because both controllers start draining their work
queues concurrently after informer sync, there is a race window on startup and
after leader re-election where node events are processed against an empty or
partially-warm cache.

For continuous enforcement this is operationally benign the controller
re-evaluates the node on the next condition, taint, or label change and
converges. For bootstrap-only rules it is a permanent correctness gap: if a
node is evaluated before its applicable rule reaches the cache, the taint is
never applied, the bootstrap completion annotation is never written, and the
intended admission gate is silently bypassed with no error surfaced.

Current Behavior

Startup sequence:

  1. mgr.Start() starts informers and waits for cache synchronization all
    NodeReadinessRule objects are now readable from the local informer cache.
  2. RuleReconciler and NodeReconciler begin draining their respective work
    queues concurrently.
  3. RuleReconciler reconciles existing rules one by one
    (MaxConcurrentReconciles: 1), calling updateRuleCache for each.
  4. NodeReconciler immediately processes queued Node events including events
    for nodes that joined during or just before startup against whatever rules
    happen to be in the cache at that moment.

With a non-trivial rule count the ruleCache reaches full coverage only after
all rules have been individually reconciled. Any node event processed before a
given rule is cached is invisible to that rule.

Why Bootstrap-Only Mode Is Specifically Vulnerable

The bootstrap completion path in evaluateRuleForNode is:

conditions not satisfied → taint added
conditions satisfied → taint removed → markBootstrapCompleted → annotation written
subsequent reconciles → isBootstrapCompleted returns true → skip

markBootstrapCompleted is reached only inside the
shouldRemoveTaint && currentlyHasTaint branch. If a node is first evaluated
while its applicable rule is absent from the cache:

  • getApplicableRulesForNode returns an empty slice no taint is applied.
  • The node's conditions are satisfied (CNI ready, security agent healthy, etc.).
  • When the rule eventually enters the cache and the node is re-evaluated,
    evaluateRuleForNode sees shouldRemoveTaint=true, currentlyHasTaint=false
    and falls through to the default branch: no action, no call to
    markBootstrapCompleted.
  • The bootstrap annotation is never written. The rule re-evaluates the node on
    every subsequent event indefinitely but the intended gate was never enforced.

Steps to Reproduce / Concrete Scenario

  1. Cluster undergoes a rolling restart of the controller (upgrade, crash, or
    leader re-election).
  2. Within the startup window a replacement node joins. Its CNI plugin reports
    NetworkReady=True within ~8 seconds (typical for a pre-warmed agent).
  3. NodeReconciler picks up the node CREATE event; the network-readiness
    bootstrap-only rule has not yet been reconciled by RuleReconciler (queue
    still has earlier items ahead of it) and is absent from ruleCache.
  4. getApplicableRulesForNode returns empty no taint is applied.
  5. The rule is cached ~30 seconds later. Re-evaluation: NetworkReady=True,
    no taint present → default branch → no action, no annotation.
  6. The node accepts workload scheduling without the readiness gate ever having
    been enforced.

This reproduces on any controller restart in a cluster actively scaling out and
on every leader re-election when the replacement leader starts with a cold cache.

Expected Behavior

Nodes should never be evaluated by NodeReconciler against an empty or
incomplete cache when NodeReadinessRules already exist in the cluster.
Bootstrap-only rules must apply their taint reliably regardless of the ordering
in which the two controllers drain their queues on startup.

Root Cause

The ruleCache is populated entirely as a side effect of
RuleReconciler.Reconcile() via updateRuleCache. There is no mechanism to
seed it from the already-synced informer cache before NodeReconciler begins
processing events. mgr.Start() guarantees informers are synced before
controllers run, so all NodeReadinessRules are available locally the moment
both controllers start the gap is that ruleCache is not initialized from
this data.

Possible Implementation Direction

A WarmCache method on RuleReadinessController that lists existing rules from
the already-synced informer cache (no live API call) and calls updateRuleCache
for each would close the gap:

func (r *RuleReadinessController) WarmCache(ctx context.Context) error {
    ruleList := &readinessv1alpha1.NodeReadinessRuleList{}
    if err := r.List(ctx, ruleList); err != nil {
        return fmt.Errorf("failed to warm rule cache: %w", err)
    }
    for i := range ruleList.Items {
        r.updateRuleCache(ctx, &ruleList.Items[i])
    }
    return nil
}

This can be invoked via a manager.Runnable registered with mgr.Add() that
executes after cache sync and completes before the manager signals readiness.
Because the list is served from the local informer cache, cost is bounded by
rule count rather than node count and adds negligible startup latency. The
warm-up should be re-run on each leadership term so that leader re-election does
not reintroduce the window.

An alternative is a lazy fallback inside NodeReconciler.Reconcile: on the
first pass detect that len(ruleCache) == 0 while rules exist in the informer
cache and perform an inline warm-up before evaluating the node.

Acceptance Criteria

A test (integration or e2e) covers nodes joining during the rule cache warm-up window and verifies that bootstrap-only taints are correctly applied.
No node event is processed against an empty cache when NodeReadinessRules exist in the cluster.
Cache warm-up cost is bounded by rule count, not node count (served from local informer cache, not live API).
Leader re-election does not reintroduce the race window the replacement leader warms its cache before declaring ready.
No regression in continuous enforcement mode behavior.

Additional Context

The ruleCache also drives the DeletionTimestamp guard in
processNodeAgainstAllRules that prevents taint operations on rules being
deleted. A fully-warm cache on startup is equally important for that invariant.
The recent hardening of bootstrap annotation cleanup on rule deletion
(ea74209)
addresses the deletion side of the bootstrap lifecycle; this issue addresses the
complementary startup-time gap in the same semantic guarantee.

/kind bug

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions