What happened
When a NodeReadinessRule has already managed a node, and that node later changes labels so it no longer matches the rule nodeSelector, the node reconciler does not remove the taint previously managed by that rule.
The node update predicate does enqueue reconciliation on label changes, but the reconciliation path only loads rules that currently match the node. After the label change, the rule is no longer considered, so the stale taint is never cleaned up
Why this matters
A node can remain incorrectly tainted after it leaves the rule target set. For NoSchedule taints, this can prevent workloads from scheduling on the node. For NoExecute taints, this can be more disruptive because pods without tolerations may be evicted or kept away even though the rule no longer applies to that node
Reproduction
- Create a node with label
env=test
- Create a continuous NodeReadinessRule with
nodeSelector.matchLabels.env=test and a taint such as readiness.k8s.io/test-taint:NoSchedule
- Reconcile the node while the rule applies, so the rule records a NodeEvaluation for that node and manages/adopts the taint
- Change the node label to
env=other
- Reconcile the node again
Actual behavior
The taint remains on the node
The relevant log line from a focused regression test is:
Processing node against rules {"node": "node-controller-test-node", "ruleCount": 0}
Because the rule no longer matches the node, getApplicableRulesForNode returns no rules, and the cleanup path never runs
Expected behavior
If a rule previously managed a node, and the node stops matching that rule due to a label change, the controller should remove the rule-managed taint and clean up the node-specific rule status for that node
The cleanup should be scoped to nodes that have evidence of prior rule management, such as existing NodeEvaluation or appliedNodes status, so the controller does not remove unrelated pre-existing taints from nodes that never matched the rule
Notes
Rule deletion and rule nodeSelector changes already have cleanup paths. The missing case is node label changes that make an already-managed node stop matching the selector
What happened
When a NodeReadinessRule has already managed a node, and that node later changes labels so it no longer matches the rule nodeSelector, the node reconciler does not remove the taint previously managed by that rule.
The node update predicate does enqueue reconciliation on label changes, but the reconciliation path only loads rules that currently match the node. After the label change, the rule is no longer considered, so the stale taint is never cleaned up
Why this matters
A node can remain incorrectly tainted after it leaves the rule target set. For NoSchedule taints, this can prevent workloads from scheduling on the node. For NoExecute taints, this can be more disruptive because pods without tolerations may be evicted or kept away even though the rule no longer applies to that node
Reproduction
env=testnodeSelector.matchLabels.env=testand a taint such asreadiness.k8s.io/test-taint:NoScheduleenv=otherActual behavior
The taint remains on the node
The relevant log line from a focused regression test is:
Because the rule no longer matches the node,
getApplicableRulesForNodereturns no rules, and the cleanup path never runsExpected behavior
If a rule previously managed a node, and the node stops matching that rule due to a label change, the controller should remove the rule-managed taint and clean up the node-specific rule status for that node
The cleanup should be scoped to nodes that have evidence of prior rule management, such as existing NodeEvaluation or appliedNodes status, so the controller does not remove unrelated pre-existing taints from nodes that never matched the rule
Notes
Rule deletion and rule nodeSelector changes already have cleanup paths. The missing case is node label changes that make an already-managed node stop matching the selector