What happened
processNodeAgainstAllRules in internal/controller/node_controller.go always runs a status patch after evaluateRuleForNode, even when the evaluation returned an error. The patch builds the new NodeEvaluation entry by scanning the in-memory rule's Status.NodeEvaluations for one with the current node's name:
https://github.com/kubernetes-sigs/node-readiness-controller/blob/main/internal/controller/node_controller.go#L159-L206
currEval := readinessv1alpha1.NodeEvaluation{}
for _, eval := range rule.Status.NodeEvaluations {
if eval.NodeName == node.Name {
currEval = eval
break
}
}
found := false
for i := range latestRule.Status.NodeEvaluations {
if latestRule.Status.NodeEvaluations[i].NodeName == node.Name {
latestRule.Status.NodeEvaluations[i] = currEval
found = true
break
}
}
if !found {
latestRule.Status.NodeEvaluations = append(
latestRule.Status.NodeEvaluations,
currEval,
)
}
If evaluateRuleForNode exits early on an error (taint patch failure such as a non-conflict API error or exhausted conflict retries), it never reaches updateNodeEvaluationStatus, so the in-memory rule.Status.NodeEvaluations has no entry for this node when the cache copy didn't already contain one. currEval then stays the zero value NodeEvaluation{NodeName: \"\"}, and the loop above either:
- appends that empty entry to
latestRule.Status.NodeEvaluations, or
- overwrites a previously valid entry with it.
NodeEvaluation.NodeName is annotated with +kubebuilder:validation:MinLength=1 and a hostname pattern in api/v1alpha1/nodereadinessrule_types.go, so the API server rejects the entire Status().Patch with HTTP 422. RetryOnConflict doesn't retry on validation errors, so the patch fails permanently for this iteration.
The same patch carries the FailedNodes update built up on lines 192-203 - i.e. the very record an operator needs to debug the taint failure. So the failure path leaves both the NodeEvaluation and the FailedNodes entry unwritten, and the only signal is a log line on the controller pod.
What you expected to happen
When evaluateRuleForNode fails:
- the NodeEvaluation status for the node should not be overwritten with an empty entry, and
- the FailedNodes update should still be persisted so operators can see the failure via
kubectl get nodereadinessrule -o yaml.
How to reproduce
- Create a
NodeReadinessRule that targets a node and apply the rule so the controller starts evaluating.
- Cause
addTaintBySpec / removeTaintBySpec to return a non-conflict error persistently for one node (e.g. RBAC denied for nodes patch, or simulate via webhook). The taint operation will exhaust retries and evaluateRuleForNode will return an error.
- Observe controller logs:
- "Failed to evaluate rule for node" is logged.
- Then "Failed to update rule status after node evaluation" is logged with a 422 from the API server about
nodeName not matching the required pattern / minimum length.
kubectl get nodereadinessrule <name> -o yaml shows no FailedNodes entry for the node, even though the controller hit a failure.
Suggested fix
When currEval.NodeName == \"\" (i.e. the in-memory rule had no evaluation for this node), skip the NodeEvaluation upsert and only update FailedNodes. Keeping the existing valid NodeEvaluation untouched is the correct behavior on a failed evaluation, and it lets the FailedNodes update through.
/kind bug
What happened
processNodeAgainstAllRulesininternal/controller/node_controller.goalways runs a status patch afterevaluateRuleForNode, even when the evaluation returned an error. The patch builds the new NodeEvaluation entry by scanning the in-memory rule'sStatus.NodeEvaluationsfor one with the current node's name:https://github.com/kubernetes-sigs/node-readiness-controller/blob/main/internal/controller/node_controller.go#L159-L206
If
evaluateRuleForNodeexits early on an error (taint patch failure such as a non-conflict API error or exhausted conflict retries), it never reachesupdateNodeEvaluationStatus, so the in-memoryrule.Status.NodeEvaluationshas no entry for this node when the cache copy didn't already contain one.currEvalthen stays the zero valueNodeEvaluation{NodeName: \"\"}, and the loop above either:latestRule.Status.NodeEvaluations, orNodeEvaluation.NodeNameis annotated with+kubebuilder:validation:MinLength=1and a hostname pattern inapi/v1alpha1/nodereadinessrule_types.go, so the API server rejects the entireStatus().Patchwith HTTP 422.RetryOnConflictdoesn't retry on validation errors, so the patch fails permanently for this iteration.The same patch carries the
FailedNodesupdate built up on lines 192-203 - i.e. the very record an operator needs to debug the taint failure. So the failure path leaves both the NodeEvaluation and the FailedNodes entry unwritten, and the only signal is a log line on the controller pod.What you expected to happen
When
evaluateRuleForNodefails:kubectl get nodereadinessrule -o yaml.How to reproduce
NodeReadinessRulethat targets a node and apply the rule so the controller starts evaluating.addTaintBySpec/removeTaintBySpecto return a non-conflict error persistently for one node (e.g. RBAC denied fornodespatch, or simulate via webhook). The taint operation will exhaust retries andevaluateRuleForNodewill return an error.nodeNamenot matching the required pattern / minimum length.kubectl get nodereadinessrule <name> -o yamlshows no FailedNodes entry for the node, even though the controller hit a failure.Suggested fix
When
currEval.NodeName == \"\"(i.e. the in-memory rule had no evaluation for this node), skip the NodeEvaluation upsert and only updateFailedNodes. Keeping the existing valid NodeEvaluation untouched is the correct behavior on a failed evaluation, and it lets the FailedNodes update through./kind bug