-
Notifications
You must be signed in to change notification settings - Fork 4
feat(stack): allow controller metrics scraping #95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
ca8a8b9
f18919b
48d7b49
7ec4ec8
faea326
d3da635
d5d2a7a
4b9909b
74a2275
f5d77f1
c602a89
d94a1d7
6e4c87c
e72a399
39fafc4
c4e9be1
288f3c0
a21e70b
dbf848d
8f3c861
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -193,6 +193,7 @@ prometheus: | |
| kubernetes_sd_configs: | ||
| - role: endpointslice | ||
| relabel_configs: | ||
| # Scheme annotation overrides the job's default scrape protocol when a target serves HTTPS. | ||
| - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme] | ||
| action: replace | ||
| target_label: __scheme__ | ||
|
|
@@ -209,25 +210,64 @@ prometheus: | |
| - action: labelmap | ||
| regex: __meta_kubernetes_service_annotation_prometheus_io_param_(.+) | ||
| replacement: __param_$1 | ||
| - action: labelmap | ||
| - &kubexEndpointsliceServiceLabels | ||
| action: labelmap | ||
| regex: __meta_kubernetes_service_label_(.+) | ||
| - source_labels: [__meta_kubernetes_namespace] | ||
| - &kubexEndpointsliceNamespace | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. CONTENT OF THIS REVIEW IS AI GENERATED [Severity: Minor] [Confidence: High] Location: charts/kubex-automation-stack/values.yaml:219 Issue: The Why it matters: Double-scraping the same endpoint wastes resources; the shared job's allowlist then drops all controller metrics that don't match the Suggested fix: Add a negative lookahead (or a separate - source_labels: [__meta_kubernetes_endpointslice_name]
action: keep
regex: '((kubex|densify)-(kube-state-metrics|prometheus-node-exporter|ephemeral-storage-collector)|.*dcgm|k8s-ephemeral-storage-metrics).*'
# Optionally, add a drop to prevent accidental overlap with the dedicated job:
- source_labels: [__meta_kubernetes_service_name]
action: drop
regex: '.+-kubex-automation-engine-metrics-service$' |
||
| source_labels: [__meta_kubernetes_namespace] | ||
|
gasarekubex marked this conversation as resolved.
|
||
| action: replace | ||
| target_label: namespace | ||
| - source_labels: [__meta_kubernetes_service_name] | ||
| action: drop | ||
| regex: '.+-kubex-automation-engine-metrics-service$' | ||
| - source_labels: [__meta_kubernetes_endpointslice_name] | ||
| action: keep | ||
| regex: '((kubex|densify)-(kube-state-metrics|prometheus-node-exporter|ephemeral-storage-collector)|.*dcgm|k8s-ephemeral-storage-metrics).*' | ||
| - source_labels: [__meta_kubernetes_service_name] | ||
| - &kubexEndpointsliceServiceName | ||
| source_labels: [__meta_kubernetes_service_name] | ||
| action: replace | ||
|
gasarekubex marked this conversation as resolved.
|
||
| target_label: service | ||
| - source_labels: [__meta_kubernetes_pod_node_name] | ||
| - &kubexEndpointsliceNodeName | ||
|
gasarekubex marked this conversation as resolved.
|
||
| source_labels: [__meta_kubernetes_pod_node_name] | ||
|
gasarekubex marked this conversation as resolved.
|
||
| action: replace | ||
| target_label: node | ||
| metric_relabel_configs: | ||
| - source_labels: [__name__] | ||
|
gasarekubex marked this conversation as resolved.
|
||
| regex: '^(DCGM_FI_(DEV_(FB_(FREE|USED)|GPU_UTIL|POWER_USAGE)|PROF_(DRAM_ACTIVE|GR_ENGINE_ACTIVE|PIPE_TENSOR_ACTIVE))|ephemeral_storage_.*|kube_(cronjob_(created|info|labels|next_schedule_time|status_(active|last_schedule_time))|daemonset_(created|labels|status_number_available)|deployment_(created|labels|metadata_generation|spec_strategy_rollingupdate_max_(surge|unavailable))|horizontalpodautoscaler_(info|labels|spec_(max_replicas|min_replicas|target_metric)|status_(condition|current_replicas|target_metric))|job_(created|info|labels|owner|spec_(completions|parallelism)|status_(active|completion_time|start_time))|namespace_(annotations|labels)|node_(info|labels|role|spec_taint|status_(allocatable|capacity))|pod_(container_(info|resource_(limits|requests)|status_(last_terminated_(exitcode|timestamp)|restarts_total|terminated(?:_reason)?))|created|info|labels|owner|status_(phase|qos_class))|replicaset_(created|labels|owner|spec_replicas)|replicationcontroller_(created|spec_replicas)|resourcequota(?:_created)?|statefulset_(created|labels|replicas))|node_(cpu_(core_throttles_total|seconds_total)|disk_(read_bytes_total|reads_completed_total|writes_completed_total|written_bytes_total)|memory_(Buffers_bytes|Cached_bytes|MemFree_bytes|MemTotal_bytes|SReclaimable_bytes)|network_(receive_(bytes_total|packets_total)|speed_bytes|transmit_(bytes_total|packets_total))|vmstat_oom_kill)|openshift_clusterresourcequota_(created|labels|namespace_usage|selector|usage))$' | ||
| action: keep | ||
|
gasarekubex marked this conversation as resolved.
|
||
|
|
||
|
gasarekubex marked this conversation as resolved.
|
||
| - job_name: 'kubex-automation-engine-metrics-endpointslice' | ||
| # The controller chart exposes unauthenticated metrics on port 8080 over plain HTTP. | ||
| # Unlike the shared endpointslice job above, this scrape does not honor scheme annotations or bearer tokens. | ||
| # The job is intentionally fixed to the validated controller metrics defaults. | ||
| # The shared annotation-driven path/port/param rules stay with the shared job; this job keeps its own explicit endpoint. | ||
| # The target is scraped by service-name suffix so multiple controller releases can be collected when needed. | ||
| # No bearer token is required for this /metrics endpoint. | ||
| scheme: http | ||
|
gasarekubex marked this conversation as resolved.
|
||
| kubernetes_sd_configs: | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. CONTENT OF THIS REVIEW IS AI GENERATED [Severity: Major] [Confidence: High] Location: charts/kubex-automation-stack/values.yaml:247 Issue: The Why it matters: The asymmetric use of anchors (some aliased, some re-inlined) creates a maintenance trap. A future author completing the pattern by adding the missing aliases will silently break the port override. Suggested fix: Add a guard comment immediately before the first alias in the new job: relabel_configs:
# Fixed path/port — do NOT add *kubexEndpointsliceAddress or *kubexEndpointsliceMetricsPath here;
# this job intentionally hard-codes port 8080 and path /metrics.
- target_label: __metrics_path__
replacement: /metrics
- *kubexEndpointsliceServiceLabels
... |
||
| - role: endpointslice | ||
| relabel_configs: | ||
|
gasarekubex marked this conversation as resolved.
gasarekubex marked this conversation as resolved.
|
||
| # This job uses a fixed controller metrics path/port instead of the shared annotation-driven overrides. | ||
| - target_label: __metrics_path__ | ||
| replacement: /metrics | ||
|
gasarekubex marked this conversation as resolved.
|
||
| - *kubexEndpointsliceServiceLabels | ||
| - *kubexEndpointsliceNamespace | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. CONTENT OF THIS REVIEW IS AI GENERATED [Severity: Major] [Confidence: High] Location: charts/kubex-automation-stack/values.yaml:254 Issue: The address-rewrite regex Why it matters: The rewritten Suggested fix: Anchor the regex to the full string: - source_labels: [__address__]
action: replace
target_label: __address__
regex: '(.+?)(?::\d+)?$'
replacement: '$1:8080'Or use a non-lazy quantifier with the anchor: regex: '([^:]+)(?::\d+)?$'
replacement: '$1:8080' |
||
| # Matches any Helm release name prefix (for example 'controller-' or 'prod-'). | ||
| # This intentionally supports scraping multiple controller releases when their service names share this suffix. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. CONTENT OF THIS REVIEW IS AI GENERATED [Severity: Major] [Confidence: High] Location: charts/kubex-automation-stack/values.yaml:261 Issue: The Why it matters: The address-rewrite Suggested fix: Re-order relabel_configs:
- target_label: __metrics_path__
replacement: /metrics
- *kubexEndpointsliceServiceLabels
- *kubexEndpointsliceNamespace
# Keep filter FIRST — before any address mutation
- source_labels: [__meta_kubernetes_service_name]
action: keep
regex: '.+-kubex-automation-engine-metrics-service$'
# Port rewrite only runs for already-selected targets
- source_labels: [__address__]
action: replace
target_label: __address__
regex: '(.+?)(?::\d+)?'
replacement: '$1:8080'
- *kubexEndpointsliceServiceName
- *kubexEndpointsliceNodeName |
||
| - source_labels: [__meta_kubernetes_service_name] | ||
| action: keep | ||
| regex: '.+-kubex-automation-engine-metrics-service$' | ||
| - source_labels: [__address__] | ||
| action: replace | ||
| target_label: __address__ | ||
| regex: '(.+?)(?::\d+)?' | ||
| replacement: '$1:8080' | ||
| - *kubexEndpointsliceServiceName | ||
| - *kubexEndpointsliceNodeName | ||
| # Stores all scraped metrics as-is (no metric_relabel_configs filter). | ||
| # Expected families: controller_runtime_*, go_*, process_*, workqueue_*, rest_client_*, automation_controller_* | ||
| # Add a metric_relabel_configs allowlist here if cardinality becomes a concern. | ||
|
gasarekubex marked this conversation as resolved.
|
||
|
|
||
| ################################################################# | ||
| # Ephemeral Storage Metrics Exporter | ||
|
gasarekubex marked this conversation as resolved.
gasarekubex marked this conversation as resolved.
|
||
| # Collects and exposes ephemeral storage metrics via a DaemonSet | ||
|
gasarekubex marked this conversation as resolved.
gasarekubex marked this conversation as resolved.
gasarekubex marked this conversation as resolved.
gasarekubex marked this conversation as resolved.
|
||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.