Skip to content

Node label changes can leave stale rule-managed taints after nodeSelector no longer matches #235

@sahitya-chandra

Description

@sahitya-chandra

What happened

When a NodeReadinessRule has already managed a node, and that node later changes labels so it no longer matches the rule nodeSelector, the node reconciler does not remove the taint previously managed by that rule.

The node update predicate does enqueue reconciliation on label changes, but the reconciliation path only loads rules that currently match the node. After the label change, the rule is no longer considered, so the stale taint is never cleaned up

Why this matters

A node can remain incorrectly tainted after it leaves the rule target set. For NoSchedule taints, this can prevent workloads from scheduling on the node. For NoExecute taints, this can be more disruptive because pods without tolerations may be evicted or kept away even though the rule no longer applies to that node

Reproduction

  1. Create a node with label env=test
  2. Create a continuous NodeReadinessRule with nodeSelector.matchLabels.env=test and a taint such as readiness.k8s.io/test-taint:NoSchedule
  3. Reconcile the node while the rule applies, so the rule records a NodeEvaluation for that node and manages/adopts the taint
  4. Change the node label to env=other
  5. Reconcile the node again

Actual behavior

The taint remains on the node

The relevant log line from a focused regression test is:

Processing node against rules {"node": "node-controller-test-node", "ruleCount": 0}

Because the rule no longer matches the node, getApplicableRulesForNode returns no rules, and the cleanup path never runs

Expected behavior

If a rule previously managed a node, and the node stops matching that rule due to a label change, the controller should remove the rule-managed taint and clean up the node-specific rule status for that node

The cleanup should be scoped to nodes that have evidence of prior rule management, such as existing NodeEvaluation or appliedNodes status, so the controller does not remove unrelated pre-existing taints from nodes that never matched the rule

Notes

Rule deletion and rule nodeSelector changes already have cleanup paths. The missing case is node label changes that make an already-managed node stop matching the selector

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions