Describe the bug
In internal/controller/node_controller.go, when evaluateRuleForNode
fails inside processNodeAgainstAllRules, recordNodeFailure is called
but metrics.Failures counter is NOT incremented.
Line 155 (node_controller.go):
r.recordNodeFailure(rule, node.Name, "EvaluationError", err.Error())
// metrics.Failures.WithLabelValues(rule.Name, "EvaluationError").Inc() -- MISSING
Compare with processAllNodesForRule in nodereadinessrule_controller.go
line 275-276 which correctly does both:
r.recordNodeFailure(rule, node.Name, "EvaluationError", err.Error())
metrics.Failures.WithLabelValues(rule.Name, "EvaluationError").Inc()
Impact
Failures triggered via the NodeReconciler path (node condition changes)
are silently missing from node_readiness_failures_total Prometheus
metric, causing incomplete observability for operators monitoring
cluster health.
Expected behavior
metrics.Failures.WithLabelValues(rule.Name, "EvaluationError").Inc()
should be called alongside recordNodeFailure in node_controller.go.
File
internal/controller/node_controller.go line 155
Are you able to fix this issue?
Yes (I will propose a PR)
Describe the bug
In
internal/controller/node_controller.go, whenevaluateRuleForNodefails inside
processNodeAgainstAllRules,recordNodeFailureis calledbut
metrics.Failurescounter is NOT incremented.Line 155 (node_controller.go):
Compare with
processAllNodesForRulein nodereadinessrule_controller.goline 275-276 which correctly does both:
Impact
Failures triggered via the NodeReconciler path (node condition changes)
are silently missing from
node_readiness_failures_totalPrometheusmetric, causing incomplete observability for operators monitoring
cluster health.
Expected behavior
metrics.Failures.WithLabelValues(rule.Name, "EvaluationError").Inc()should be called alongside
recordNodeFailurein node_controller.go.File
internal/controller/node_controller.goline 155Are you able to fix this issue?
Yes (I will propose a PR)