Skip to content

bug: processAllNodesForRule swallows node evaluation errors, preventing reconcile retries #234

@dorodb-web22

Description

@dorodb-web22

Describe the bug

RuleReadinessController.processAllNodesForRule catches errors from evaluateRuleForNode but always returns nil, even when evaluations fail:

for _, node := range nodeList.Items {
    if r.ruleAppliesTo(ctx, rule, &node) {
        if err := r.evaluateRuleForNode(ctx, rule, &node); err != nil {
            log.Error(err, "Failed to evaluate node for rule", ...)
            r.recordNodeFailure(rule, node.Name, "EvaluationError", err.Error())
            metrics.Failures.WithLabelValues(rule.Name, "EvaluationError").Inc()
            // error is dropped here
        }
    }
}
return nil // always nil

Because processAllNodesForRule always returns nil, the calling RuleReconciler.Reconcile at line 152 never sees the failure, so controller-runtime treats every reconciliation as successful and does not trigger a requeue/backoff.

This is the same class of bug fixed in #222 for the node reconciler path. This function is the rule reconciler path equivalent.

Expected behavior

Transient failures during node evaluation (e.g. API conflicts, patch errors) should cause processAllNodesForRule to return an aggregated error so that controller-runtime can requeue with backoff.

Suggested fix

Accumulate errors and return errors.Join(errs...), same pattern as processNodeAgainstAllRules after #222. Existing behavior of continuing across all nodes is preserved.

/kind bug

Metadata

Metadata

Assignees

Labels

kind/bugCategorizes issue or PR as related to a bug.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions