Skip to content

bug: EvaluationError failures from NodeReconciler not tracked in metrics #241

@Shreya2005-2005

Description

@Shreya2005-2005

Describe the bug

In internal/controller/node_controller.go, when evaluateRuleForNode
fails inside processNodeAgainstAllRules, recordNodeFailure is called
but metrics.Failures counter is NOT incremented.

Line 155 (node_controller.go):

r.recordNodeFailure(rule, node.Name, "EvaluationError", err.Error())
// metrics.Failures.WithLabelValues(rule.Name, "EvaluationError").Inc() -- MISSING

Compare with processAllNodesForRule in nodereadinessrule_controller.go
line 275-276 which correctly does both:

r.recordNodeFailure(rule, node.Name, "EvaluationError", err.Error())
metrics.Failures.WithLabelValues(rule.Name, "EvaluationError").Inc()

Impact

Failures triggered via the NodeReconciler path (node condition changes)
are silently missing from node_readiness_failures_total Prometheus
metric, causing incomplete observability for operators monitoring
cluster health.

Expected behavior

metrics.Failures.WithLabelValues(rule.Name, "EvaluationError").Inc()
should be called alongside recordNodeFailure in node_controller.go.

File

internal/controller/node_controller.go line 155

Are you able to fix this issue?

Yes (I will propose a PR)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions