MON-4036: Add TelemeterClientConfig to ClusterMonitoring API#2755
MON-4036: Add TelemeterClientConfig to ClusterMonitoring API#2755danielmellado wants to merge 1 commit intoopenshift:masterfrom
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
@danielmellado: This pull request references MON-4036 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target either version "4.22." or "openshift-4.22.", but it targets "openshift-5.0" instead. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Hello @danielmellado! Some important instructions when contributing to openshift/api: |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughA new optional 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.11.3)Error: build linters: unable to load custom analyzer "kubeapilinter": tools/_output/bin/kube-api-linter.so, plugin: not implemented Comment |
Review Summary by QodoAdd TelemeterClientConfig to ClusterMonitoring API for pod scheduling and resources
WalkthroughsDescription• Add TelemeterClientConfig struct to ClusterMonitoringSpec API • Support pod scheduling via nodeSelector, tolerations, and topologySpreadConstraints • Enable resource management with resources field for CPU/memory constraints • Include comprehensive validation rules and test coverage for new configuration Diagramflowchart LR
A["ClusterMonitoringSpec"] -->|adds field| B["TelemeterClientConfig"]
B -->|supports| C["nodeSelector"]
B -->|supports| D["resources"]
B -->|supports| E["tolerations"]
B -->|supports| F["topologySpreadConstraints"]
C -->|controls| G["Pod Scheduling"]
D -->|manages| H["Resource Limits"]
E -->|handles| I["Taint Tolerance"]
F -->|distributes| J["Topology Domains"]
File Changes1. config/v1alpha1/types_cluster_monitoring.go
|
Code Review by Qodo
1. TelemeterClientConfig MinProperties undocumented
|
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
| // TelemeterClientConfig provides configuration options for the Telemeter Client component | ||
| // that runs in the `openshift-monitoring` namespace. The Telemeter Client collects selected | ||
| // monitoring metrics and forwards them to Red Hat for telemetry purposes. | ||
| // Use this configuration to control pod scheduling and resource allocation. | ||
| // +kubebuilder:validation:MinProperties=1 | ||
| type TelemeterClientConfig struct { | ||
| // nodeSelector defines the nodes on which the Pods are scheduled. |
There was a problem hiding this comment.
1. telemeterclientconfig minproperties undocumented 📘 Rule violation ✓ Correctness
TelemeterClientConfig has +kubebuilder:validation:MinProperties=1, but the type comment does not document that the object must not be empty (i.e., at least one field must be set). This violates the requirement that validation markers be fully documented, potentially confusing API consumers about valid/invalid configurations.
Agent Prompt
## Issue description
`TelemeterClientConfig` includes the validation marker `+kubebuilder:validation:MinProperties=1`, but the type-level comment does not document the resulting constraint that the object must not be empty.
## Issue Context
Compliance requires that every validation marker applied to an API field/type be fully documented in comments so users understand constraints without inspecting generated schema.
## Fix Focus Areas
- config/v1alpha1/types_cluster_monitoring.go[575-581]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
There was a problem hiding this comment.
🧹 Nitpick comments (1)
config/v1alpha1/tests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml (1)
580-832: Add failure cases fornodeSelectorandtolerationsbounds.The new schema also enforces
min/maxPropertiesonnodeSelectorandmin/maxItemsontolerations, but this block only exercises negative paths forresourcesandtopologySpreadConstraints. A couple of explicit failures here would make regressions in those two branches much harder to miss.Example additions
+ - name: Should reject TelemeterClientConfig with empty nodeSelector + initial: | + apiVersion: config.openshift.io/v1alpha1 + kind: ClusterMonitoring + spec: + telemeterClientConfig: + nodeSelector: {} + expectedError: 'spec.telemeterClientConfig.nodeSelector: Invalid value: 0: spec.telemeterClientConfig.nodeSelector in body should have at least 1 properties' + + - name: Should reject TelemeterClientConfig with empty tolerations array + initial: | + apiVersion: config.openshift.io/v1alpha1 + kind: ClusterMonitoring + spec: + telemeterClientConfig: + tolerations: [] + expectedError: 'spec.telemeterClientConfig.tolerations: Invalid value: 0: spec.telemeterClientConfig.tolerations in body should have at least 1 items'🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@config/v1alpha1/tests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml` around lines 580 - 832, Add negative test cases exercising nodeSelector and tolerations bounds: include tests named e.g. "Should reject TelemeterClientConfig with empty nodeSelector" and "Should reject TelemeterClientConfig with too many nodeSelector properties" that set spec.telemeterClientConfig.nodeSelector to {} and to an object exceeding the schema maxProperties respectively, and include tests "Should reject TelemeterClientConfig with empty tolerations array" (set tolerations: []) and "Should reject TelemeterClientConfig with too many tolerations" (provide more than the allowed maxItems). Ensure each test uses the same ClusterMonitoring YAML structure as other cases and sets expectedError strings matching the schema validation messages for min/maxProperties and min/maxItems on nodeSelector and tolerations so failures are asserted.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In
`@config/v1alpha1/tests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml`:
- Around line 580-832: Add negative test cases exercising nodeSelector and
tolerations bounds: include tests named e.g. "Should reject
TelemeterClientConfig with empty nodeSelector" and "Should reject
TelemeterClientConfig with too many nodeSelector properties" that set
spec.telemeterClientConfig.nodeSelector to {} and to an object exceeding the
schema maxProperties respectively, and include tests "Should reject
TelemeterClientConfig with empty tolerations array" (set tolerations: []) and
"Should reject TelemeterClientConfig with too many tolerations" (provide more
than the allowed maxItems). Ensure each test uses the same ClusterMonitoring
YAML structure as other cases and sets expectedError strings matching the schema
validation messages for min/maxProperties and min/maxItems on nodeSelector and
tolerations so failures are asserted.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 897b5361-c81f-468e-bb44-621465732eb1
⛔ Files ignored due to path filters (5)
config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings.crd.yamlis excluded by!**/zz_generated.crd-manifests/*config/v1alpha1/zz_generated.deepcopy.gois excluded by!**/zz_generated*config/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**config/v1alpha1/zz_generated.swagger_doc_generated.gois excluded by!**/zz_generated*openapi/generated_openapi/zz_generated.openapi.gois excluded by!openapi/**,!**/zz_generated*
📒 Files selected for processing (3)
config/v1alpha1/tests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yamlconfig/v1alpha1/types_cluster_monitoring.gopayload-manifests/crds/0000_10_config-operator_01_clustermonitorings.crd.yaml
saschagrunert
left a comment
There was a problem hiding this comment.
The new TelemeterClientConfig struct follows the established pattern from other component configs in this file. Markers and godoc are mostly well done.
Main issue: the PR removes OpenShiftStateMetricsConfig (merged to master one day prior via MON-4033, 4cf5fe1) and replaces it with TelemeterClientConfig. After rebasing on current master, both fields need to coexist.
Also needs additional test coverage for nodeSelector and tolerations boundary validation.
config/v1alpha1/tests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml
Show resolved
Hide resolved
|
While rebasing, the CRD hit the Kubernetes CEL validation cost budget (StaticEstimatedCRDCostLimit). The quantity() CEL function has a high fixed estimated cost per invocation, and with 6 structs in ClusterMonitoringSpec each embedding []ContainerResource, the cumulative cost exceeded the limit by ~13%. Fixed by reducing MaxItems on all resources []ContainerResource fields from 10 to 5 (sufficient for practical use — cpu, memory, hugepages variants).I also added an explanatory comment on the ContainerResource type. |
40f8860 to
2f6d3d1
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
config/v1alpha1/tests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml (1)
37-60: Add the missingprometheusOperatorConfig.resourcesmax-items regression test.These cases cover the new
MaxItems=5boundary for the other touchedresourceslists, butspec.prometheusOperatorConfig.resourceswas tightened in the API too and is still untested here. One rejection case for 6 items would keep all changed validation paths covered.Proposed test case
+ - name: Should reject PrometheusOperatorConfig with more than 5 resource items + initial: | + apiVersion: config.openshift.io/v1alpha1 + kind: ClusterMonitoring + spec: + prometheusOperatorConfig: + resources: + - name: "cpu" + request: "100m" + - name: "memory" + request: "64Mi" + - name: "hugepages-2Mi" + request: "32Mi" + - name: "hugepages-1Gi" + request: "1Gi" + - name: "ephemeral-storage" + request: "1Gi" + - name: "nvidia.com/gpu" + request: "1" + expectedError: 'spec.prometheusOperatorConfig.resources: Too many: 6: must have at most 5 items'Also applies to: 312-333, 462-481, 664-683, 933-952
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@config/v1alpha1/tests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml` around lines 37 - 60, Add a regression test that mirrors the alertmanagerConfig case but targets spec.prometheusOperatorConfig.resources to assert MaxItems=5 enforcement: create a new test entry named like "Should reject PrometheusOperatorConfig with more than 5 items" with initial YAML containing spec.prometheusOperatorConfig.resources listing 6 distinct resource objects (e.g., cpu, memory, hugepages-2Mi, hugepages-1Gi, ephemeral-storage, nvidia.com/gpu) and set expectedError to 'spec.prometheusOperatorConfig.resources: Too many: 6: must have at most 5 items' so the tightened validation path for prometheusOperatorConfig.resources is covered.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@config/v1alpha1/types_cluster_monitoring.go`:
- Around line 211-218: The MaxItems for several []ContainerResource "resources"
lists was tightened from 10 to 5, which is schema-breaking; revert the
+kubebuilder:validation:MaxItems annotation (or change it back to 10) for the
"resources" list annotations so existing ClusterMonitoring CRs with 6–10 entries
continue to validate, specifically update the +kubebuilder:validation:MaxItems
on the "resources" list annotations associated with the ContainerResource lists
(the "resources" field) in types_cluster_monitoring.go so they match the
original limit (10) or remove the constraint until a migration is provided.
---
Nitpick comments:
In
`@config/v1alpha1/tests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml`:
- Around line 37-60: Add a regression test that mirrors the alertmanagerConfig
case but targets spec.prometheusOperatorConfig.resources to assert MaxItems=5
enforcement: create a new test entry named like "Should reject
PrometheusOperatorConfig with more than 5 items" with initial YAML containing
spec.prometheusOperatorConfig.resources listing 6 distinct resource objects
(e.g., cpu, memory, hugepages-2Mi, hugepages-1Gi, ephemeral-storage,
nvidia.com/gpu) and set expectedError to
'spec.prometheusOperatorConfig.resources: Too many: 6: must have at most 5
items' so the tightened validation path for prometheusOperatorConfig.resources
is covered.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 0f07ec4c-1986-4bc9-917b-36a27aee9952
⛔ Files ignored due to path filters (5)
config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings.crd.yamlis excluded by!**/zz_generated.crd-manifests/*config/v1alpha1/zz_generated.deepcopy.gois excluded by!**/zz_generated*config/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**config/v1alpha1/zz_generated.swagger_doc_generated.gois excluded by!**/zz_generated*openapi/generated_openapi/zz_generated.openapi.gois excluded by!openapi/**,!**/zz_generated*
📒 Files selected for processing (3)
config/v1alpha1/tests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yamlconfig/v1alpha1/types_cluster_monitoring.gopayload-manifests/crds/0000_10_config-operator_01_clustermonitorings.crd.yaml
| // Maximum length for this list is 5. | ||
| // Minimum length for this list is 1. | ||
| // Each resource name must be unique within this list. | ||
| // +optional | ||
| // +listType=map | ||
| // +listMapKey=name | ||
| // +kubebuilder:validation:MaxItems=10 | ||
| // +kubebuilder:validation:MaxItems=5 | ||
| // +kubebuilder:validation:MinItems=1 |
There was a problem hiding this comment.
Avoid tightening existing resources lists without a migration plan.
Lines 211-218, 422-429, 493-500, 557-564, and 617-624 all shrink already-shipped []ContainerResource fields from 10 items to 5. That is a schema-breaking contraction: any existing ClusterMonitoring object using 6-10 entries in one of those fields will stop passing validation on later updates once this CRD is installed. Please either preserve backward compatibility for the existing fields or document/implement a migration path before merging.
Also applies to: 422-429, 493-500, 557-564, 617-624, 690-697
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@config/v1alpha1/types_cluster_monitoring.go` around lines 211 - 218, The
MaxItems for several []ContainerResource "resources" lists was tightened from 10
to 5, which is schema-breaking; revert the +kubebuilder:validation:MaxItems
annotation (or change it back to 10) for the "resources" list annotations so
existing ClusterMonitoring CRs with 6–10 entries continue to validate,
specifically update the +kubebuilder:validation:MaxItems on the "resources" list
annotations associated with the ContainerResource lists (the "resources" field)
in types_cluster_monitoring.go so they match the original limit (10) or remove
the constraint until a migration is provided.
| // | ||
| // When omitted, this means the user has no opinion and the platform is left | ||
| // to choose reasonable defaults. These defaults are subject to change over time. | ||
| // The current default is an empty list. |
There was a problem hiding this comment.
Another consistency nit:
| // The current default is an empty list. | |
| // Default is empty list. |
There was a problem hiding this comment.
Same as above, thanks!
2f6d3d1 to
371ee72
Compare
|
The failure on |
|
Thanks, I'm committing this, rebasing and I'll push the updated version ;) |
Migrate the telemeter-client configmap settings to a CRD field within ClusterMonitoringSpec in config/v1alpha1. The new TelemeterClientConfig struct supports: - nodeSelector: pod scheduling to specific nodes - resources: compute resource requests and limits - tolerations: pod tolerations for scheduling - topologySpreadConstraints: pod distribution across topology domains Signed-off-by: Daniel Mellado <dmellado@fedoraproject.org>
49b7786 to
4fb7908
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
payload-manifests/crds/0000_10_config-operator_01_clustermonitorings.crd.yaml (1)
3237-3247: Consider de-duplicating the Prometheus resources size description.Line 3237 already states the 1..5 constraint, and Lines 3245-3247 repeat the same rule in different wording. Keeping one canonical phrasing would reduce noise.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@payload-manifests/crds/0000_10_config-operator_01_clustermonitorings.crd.yaml` around lines 3237 - 3247, The Prometheus resources list description repeats the 1..5 constraint twice; edit the YAML comment/description for the "resources" block in the Clustermonitorings CRD to remove the duplicated phrasing (remove the second occurrence "Maximum length for this list is 5. Minimum length for this list is 1.") so only a single canonical sentence about the minimum of 1 and maximum of 5 resource entries remains; ensure the example default entries (cpu, memory) and the note about unique resource names are preserved.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In
`@payload-manifests/crds/0000_10_config-operator_01_clustermonitorings.crd.yaml`:
- Around line 3237-3247: The Prometheus resources list description repeats the
1..5 constraint twice; edit the YAML comment/description for the "resources"
block in the Clustermonitorings CRD to remove the duplicated phrasing (remove
the second occurrence "Maximum length for this list is 5. Minimum length for
this list is 1.") so only a single canonical sentence about the minimum of 1 and
maximum of 5 resource entries remains; ensure the example default entries (cpu,
memory) and the note about unique resource names are preserved.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 8229a1e7-3d19-467b-8bae-bced8348a26f
⛔ Files ignored due to path filters (5)
config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings.crd.yamlis excluded by!**/zz_generated.crd-manifests/*config/v1alpha1/zz_generated.deepcopy.gois excluded by!**/zz_generated*config/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**config/v1alpha1/zz_generated.swagger_doc_generated.gois excluded by!**/zz_generated*openapi/generated_openapi/zz_generated.openapi.gois excluded by!openapi/**,!**/zz_generated*
📒 Files selected for processing (3)
config/v1alpha1/tests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yamlconfig/v1alpha1/types_cluster_monitoring.gopayload-manifests/crds/0000_10_config-operator_01_clustermonitorings.crd.yaml
|
@danielmellado: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Migrate the telemeter-client configmap settings to a CRD field within
ClusterMonitoringSpec in config/v1alpha1. The new TelemeterClientConfig
struct supports:
Signed-off-by: Daniel Mellado dmellado@fedoraproject.org