Skip to content

SREP-3733: Add ClusterPullSecretInvalidSRE PrometheusRule for pull secret health#2667

Merged
openshift-merge-bot[bot] merged 1 commit intoopenshift:masterfrom
MateSaary:srep-3733-promrule
Apr 8, 2026
Merged

SREP-3733: Add ClusterPullSecretInvalidSRE PrometheusRule for pull secret health#2667
openshift-merge-bot[bot] merged 1 commit intoopenshift:masterfrom
MateSaary:srep-3733-promrule

Conversation

@MateSaary
Copy link
Copy Markdown
Member

@MateSaary MateSaary commented Mar 12, 2026

What type of PR is this?

feature

What this PR does / why we need it?

Adds a ClusterPullSecretInvalidSRE PrometheusRule that alerts when the pull_secret_valid metric (openshift/osd-metrics-exporter#284) reports the cluster pull secret is invalid. The alert message includes the reason label for actionable context (e.g. MissingRegistry, MalformedJSON, EmptyCredential).

Which Jira/Github issue(s) this PR fixes?

Fixes SREP-3733

Special notes for your reviewer:

Pre-checks (if applicable):

  • Tested latest changes against a cluster

  • Included documentation changes with PR

  • If this is a new object that is not intended for the FedRAMP environment (if unsure, please reach out to team FedRAMP), please exclude it with:

    matchExpressions:
    - key: api.openshift.com/fedramp
      operator: NotIn
      values: ["true"]

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 12, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Mar 12, 2026

@MateSaary: This pull request references SREP-3733 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

What type of PR is this?

feature

What this PR does / why we need it?

Adds a ClusterPullSecretInvalidSRE PrometheusRule that alerts when the pull_secret_valid metric (openshift/osd-metrics-exporter#284) reports the cluster pull secret is invalid. The alert message includes the reason label for actionable context (e.g. MissingRegistry, MalformedJSON, EmptyCredential).

Which Jira/Github issue(s) this PR fixes?

Fixes SREP-3733

Special notes for your reviewer:

Pre-checks (if applicable):

  • Tested latest changes against a cluster

  • Included documentation changes with PR

  • If this is a new object that is not intended for the FedRAMP environment (if unsure, please reach out to team FedRAMP), please exclude it with:

    matchExpressions:
    - key: api.openshift.com/fedramp
      operator: NotIn
      values: ["true"]

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

1 similar comment
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Mar 12, 2026

@MateSaary: This pull request references SREP-3733 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

What type of PR is this?

feature

What this PR does / why we need it?

Adds a ClusterPullSecretInvalidSRE PrometheusRule that alerts when the pull_secret_valid metric (openshift/osd-metrics-exporter#284) reports the cluster pull secret is invalid. The alert message includes the reason label for actionable context (e.g. MissingRegistry, MalformedJSON, EmptyCredential).

Which Jira/Github issue(s) this PR fixes?

Fixes SREP-3733

Special notes for your reviewer:

Pre-checks (if applicable):

  • Tested latest changes against a cluster

  • Included documentation changes with PR

  • If this is a new object that is not intended for the FedRAMP environment (if unsure, please reach out to team FedRAMP), please exclude it with:

    matchExpressions:
    - key: api.openshift.com/fedramp
      operator: NotIn
      values: ["true"]

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot requested a review from bergmannf March 12, 2026 17:00
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 12, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 076ac653-9663-40f7-be20-2b1b2112d58c

📥 Commits

Reviewing files that changed from the base of the PR and between a6b37a9 and db6eed1.

📒 Files selected for processing (4)
  • deploy/sre-prometheus/100-pull-secret-health.PrometheusRule.yaml
  • hack/00-osd-managed-cluster-config-integration.yaml.tmpl
  • hack/00-osd-managed-cluster-config-production.yaml.tmpl
  • hack/00-osd-managed-cluster-config-stage.yaml.tmpl
✅ Files skipped from review due to trivial changes (1)
  • deploy/sre-prometheus/100-pull-secret-health.PrometheusRule.yaml

Walkthrough

Adds a new PrometheusRule named sre-pull-secret-health-alerts containing alert ClusterPullSecretInvalidSRE which fires when pull_secret_valid{name="osd_exporter"} == 0 for 15m. The alert carries labels severity: warning, namespace: openshift-monitoring, and includes summary/description annotations referencing {{ $labels.reason }}.

Changes

Cohort / File(s) Summary
Deployed alert manifest
deploy/sre-prometheus/100-pull-secret-health.PrometheusRule.yaml
Added PrometheusRule resource sre-pull-secret-health-alerts in openshift-monitoring with alert ClusterPullSecretInvalidSRE (expr: pull_secret_valid{name="osd_exporter"} == 0, for: 15m), labels (severity: warning, namespace: openshift-monitoring) and annotations (summary, description using {{ $labels.reason }).
Integration / stage / production templates
hack/00-osd-managed-cluster-config-integration.yaml.tmpl, hack/00-osd-managed-cluster-config-stage.yaml.tmpl, hack/00-osd-managed-cluster-config-production.yaml.tmpl
Inserted the same PrometheusRule/alert into three environment templates, mirroring the deployed manifest and annotations for integration, stage, and production configurations.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: adding a PrometheusRule named ClusterPullSecretInvalidSRE for pull secret health monitoring.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Stable And Deterministic Test Names ✅ Passed PR adds only a PrometheusRule YAML manifest file with no Ginkgo test files, making this check not applicable.
Test Structure And Quality ✅ Passed The custom check for Ginkgo test code quality is not applicable to this PR. The PR adds a Prometheus alerting rule manifest, which is a Kubernetes operational configuration file, not test code.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot requested a review from Tof1973 March 12, 2026 17:00
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Mar 12, 2026

@MateSaary: This pull request references SREP-3733 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

What type of PR is this?

feature

What this PR does / why we need it?

Adds a ClusterPullSecretInvalidSRE PrometheusRule that alerts when the pull_secret_valid metric (openshift/osd-metrics-exporter#284) reports the cluster pull secret is invalid. The alert message includes the reason label for actionable context (e.g. MissingRegistry, MalformedJSON, EmptyCredential).

Which Jira/Github issue(s) this PR fixes?

Fixes SREP-3733

Special notes for your reviewer:

Pre-checks (if applicable):

  • Tested latest changes against a cluster

  • Included documentation changes with PR

  • If this is a new object that is not intended for the FedRAMP environment (if unsure, please reach out to team FedRAMP), please exclude it with:

    matchExpressions:
    - key: api.openshift.com/fedramp
      operator: NotIn
      values: ["true"]

Summary by CodeRabbit

  • New Features
  • Added monitoring alert for pull secret health validation. The alert detects invalid pull secrets and notifies when image pull failures or telemetry issues may occur.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@MateSaary
Copy link
Copy Markdown
Member Author

MateSaary commented Mar 12, 2026

/hold

...pending merge of openshift/osd-metrics-exporter#284

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 12, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
deploy/sre-prometheus/100-pull-secret-health.PrometheusRule.yaml (1)

19-21: Consider adding a link label for SOP documentation.

The similar pull secret alerts in 100-ocm-agent-operator.PrometheusRule.yaml include a link label pointing to SOP documentation for responding to the alert. Adding this would help SREs quickly access runbook procedures when responding to this alert.

📖 Proposed fix to add SOP link
       labels:
         severity: warning
         namespace: openshift-monitoring
+        link: "https://github.com/openshift/ops-sop/blob/master/v4/alerts/OCMAgentResponseFailureServiceLogsSRE.md#verify-cluster-pull-secrets"

Note: If a dedicated SOP for this specific alert exists or will be created, use that link instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deploy/sre-prometheus/100-pull-secret-health.PrometheusRule.yaml` around
lines 19 - 21, Add a `link` label to the alert metadata so SREs can quickly
access the runbook; locate the labels block (currently containing `severity:
warning` and `namespace: openshift-monitoring`) and add a `link: "<SOP_URL>"`
entry (use the existing SOP URL used in
100-ocm-agent-operator.PrometheusRule.yaml or the specific runbook URL for this
alert) alongside `severity` and `namespace` to ensure the alert includes the SOP
link.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@deploy/sre-prometheus/100-pull-secret-health.PrometheusRule.yaml`:
- Around line 19-21: Add a `link` label to the alert metadata so SREs can
quickly access the runbook; locate the labels block (currently containing
`severity: warning` and `namespace: openshift-monitoring`) and add a `link:
"<SOP_URL>"` entry (use the existing SOP URL used in
100-ocm-agent-operator.PrometheusRule.yaml or the specific runbook URL for this
alert) alongside `severity` and `namespace` to ensure the alert includes the SOP
link.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4987b8f4-6731-4709-a07d-69439f7cad91

📥 Commits

Reviewing files that changed from the base of the PR and between 545114f and 44293bd.

📒 Files selected for processing (1)
  • deploy/sre-prometheus/100-pull-secret-health.PrometheusRule.yaml

Comment thread deploy/sre-prometheus/100-pull-secret-health.PrometheusRule.yaml
@MateSaary MateSaary force-pushed the srep-3733-promrule branch from 44293bd to a6b37a9 Compare March 24, 2026 13:46
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Mar 24, 2026

@MateSaary: This pull request references SREP-3733 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

What type of PR is this?

feature

What this PR does / why we need it?

Adds a ClusterPullSecretInvalidSRE PrometheusRule that alerts when the pull_secret_valid metric (openshift/osd-metrics-exporter#284) reports the cluster pull secret is invalid. The alert message includes the reason label for actionable context (e.g. MissingRegistry, MalformedJSON, EmptyCredential).

Which Jira/Github issue(s) this PR fixes?

Fixes SREP-3733

Special notes for your reviewer:

Pre-checks (if applicable):

  • Tested latest changes against a cluster

  • Included documentation changes with PR

  • If this is a new object that is not intended for the FedRAMP environment (if unsure, please reach out to team FedRAMP), please exclude it with:

    matchExpressions:
    - key: api.openshift.com/fedramp
      operator: NotIn
      values: ["true"]

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@MateSaary MateSaary force-pushed the srep-3733-promrule branch from a6b37a9 to db6eed1 Compare March 27, 2026 12:45
@MateSaary
Copy link
Copy Markdown
Member Author

/test checklinks-pr

@MateSaary MateSaary requested a review from bmeng March 27, 2026 14:51
@MateSaary
Copy link
Copy Markdown
Member Author

/hold cancel

@openshift-ci openshift-ci Bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 30, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 6, 2026

@MateSaary: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@bergmannf
Copy link
Copy Markdown
Contributor

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Apr 8, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 8, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bergmannf, MateSaary

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 8, 2026
@openshift-merge-bot openshift-merge-bot Bot merged commit c25b4ca into openshift:master Apr 8, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants