diff --git a/charts/kubex-automation-engine/docs/Cluster-Proactive-Policies.md b/charts/kubex-automation-engine/docs/Cluster-Proactive-Policies.md index 1d8a4cf..6d2fcc4 100644 --- a/charts/kubex-automation-engine/docs/Cluster-Proactive-Policies.md +++ b/charts/kubex-automation-engine/docs/Cluster-Proactive-Policies.md @@ -19,7 +19,7 @@ For the namespaced variant, see [Proactive Policies](./Proactive-Policies.md). F | `spec.scope.labelSelector` | none | Kubernetes label selector for matching workloads. | | `spec.scope.workloadTypes` | `[Deployment, StatefulSet, CronJob, Rollout, Job, AnalysisRun, DaemonSet]` | Workload kinds this policy applies to. | | `spec.scope.namespaceSelector.operator` | none | Namespace selector operator: `In` or `NotIn`. | -| `spec.scope.namespaceSelector.values` | none | Namespace patterns to include or exclude. | +| `spec.scope.namespaceSelector.values` | none | Namespace patterns to include or exclude (supports `*` wildcards, e.g. `prod-*`). | | `spec.automationStrategyRef.name` | none | Required cluster strategy name. | | `spec.weight` | `0` | Higher weight wins when multiple proactive policies match. | | `spec.safetyChecks.maxAnalysisAgeDays` | `5` | Rejects old recommendations. | diff --git a/charts/kubex-automation-engine/docs/Policy-Configuration.md b/charts/kubex-automation-engine/docs/Policy-Configuration.md index 152eeb3..824bb3b 100644 --- a/charts/kubex-automation-engine/docs/Policy-Configuration.md +++ b/charts/kubex-automation-engine/docs/Policy-Configuration.md @@ -72,6 +72,18 @@ Those resources remain fully supported by the controller and can be managed outs | `policy.policies..safetyChecks.maxAnalysisAgeDays` | `ClusterProactivePolicy.spec.safetyChecks.maxAnalysisAgeDays` | Per-policy value wins over top-level `policy.safetyChecks.maxAnalysisAgeDays`. | | `policy.safetyChecks.maxAnalysisAgeDays` | `ClusterProactivePolicy.spec.safetyChecks.maxAnalysisAgeDays` | Backward-compatible fallback when not set per policy. | +### Namespace wildcards in `scope[].namespaces.values` + +`scope[].namespaces.values` supports shell-style `*` wildcards when matching namespace names (for example: `prod-*`). + +```yaml +scope: + - name: platform + namespaces: + operator: In + values: ["prod-*", "staging"] +``` + Important: - `maxAnalysisAgeDays` is written to generated `ClusterProactivePolicy` resources, not to generated strategies. diff --git a/charts/kubex-automation-engine/docs/Troubleshooting.md b/charts/kubex-automation-engine/docs/Troubleshooting.md index c50015f..7d44169 100644 --- a/charts/kubex-automation-engine/docs/Troubleshooting.md +++ b/charts/kubex-automation-engine/docs/Troubleshooting.md @@ -4,6 +4,34 @@ Use this sequence when rightsizing does not happen as expected. For a consolidated map of the controller's safety gates, see [Safety Controls](./Safety-Controls.md). +## 0. Temporarily Enable Debug Logging (and Revert) + +Most of the time you only want debug logs briefly. The quickest way is to update the live Deployment args (this triggers a rollout and will be overwritten by the next `helm upgrade`). + +Enable debug (temporary): + +```bash +kubectl -n kubex patch deploy/$(kubectl -n kubex get deploy -l app.kubernetes.io/name=kubex-automation-engine -o jsonpath='{.items[0].metadata.name}') --type='json' -p='[{"op":"replace","path":"/spec/template/spec/containers/0/args/3","value":"--zap-log-level=debug"}]' +``` + +Revert back to info: + +```bash +kubectl -n kubex patch deploy/$(kubectl -n kubex get deploy -l app.kubernetes.io/name=kubex-automation-engine -o jsonpath='{.items[0].metadata.name}') --type='json' -p='[{"op":"replace","path":"/spec/template/spec/containers/0/args/3","value":"--zap-log-level=info"}]' +``` + +If you want the setting to persist across upgrades, use Helm instead: + +```bash +helm upgrade kubex-automation kubex/kubex-automation-engine -n kubex --reuse-values --set 'controllerManager.extraArgs[0]=--zap-log-level=debug' +``` + +Revert with Helm: + +```bash +helm upgrade kubex-automation kubex/kubex-automation-engine -n kubex --reuse-values --set 'controllerManager.extraArgs[0]=--zap-log-level=info' +``` + ## 1. Interpret `rightsizing summary` Logs ```bash diff --git a/charts/kubex-automation-engine/templates/deployment.yaml b/charts/kubex-automation-engine/templates/deployment.yaml index 7ffe4a6..879122a 100644 --- a/charts/kubex-automation-engine/templates/deployment.yaml +++ b/charts/kubex-automation-engine/templates/deployment.yaml @@ -10,6 +10,9 @@ kind: Deployment metadata: name: {{ include "kubex-automation-engine.fullname" . }} namespace: {{ include "kubex-automation-engine.namespace" . }} + annotations: + # This annotation is set by default so that the automation doesn't attempt to automate itself + rightsizing.kubex.ai/pause-until: infinite labels: {{- include "kubex-automation-engine.labels" . | nindent 4 }} spec: diff --git a/charts/kubex-automation-engine/templates/role.yaml b/charts/kubex-automation-engine/templates/role.yaml index 410776c..00c86d7 100644 --- a/charts/kubex-automation-engine/templates/role.yaml +++ b/charts/kubex-automation-engine/templates/role.yaml @@ -130,6 +130,7 @@ rules: - globalconfigurations - gpuconsolidationpolicies - gpurebalancingpolicies + - podaffinitypolicies - policyevaluations - proactivepolicies - staticpolicies @@ -150,6 +151,7 @@ rules: - globalconfigurations/finalizers - gpuconsolidationpolicies/finalizers - gpurebalancingpolicies/finalizers + - podaffinitypolicies/finalizers - policyevaluations/finalizers - proactivepolicies/finalizers - staticpolicies/finalizers @@ -164,6 +166,7 @@ rules: - globalconfigurations/status - gpuconsolidationpolicies/status - gpurebalancingpolicies/status + - podaffinitypolicies/status - policyevaluations/status - proactivepolicies/status - staticpolicies/status diff --git a/charts/kubex-automation-engine/values.yaml b/charts/kubex-automation-engine/values.yaml index 1045d00..c6f2fa2 100644 --- a/charts/kubex-automation-engine/values.yaml +++ b/charts/kubex-automation-engine/values.yaml @@ -4,7 +4,7 @@ crdCheck: skip: false # Minimum version of the kubex-crds chart required before this chart can be installed/upgraded. # Bump this whenever a new CRD field or schema change is required. - minCRDsHelmVersion: "1.0.0" + minCRDsHelmVersion: "1.0.1" # -- Whether to create the kubex-gateway-config Secret automatically createSecrets: true diff --git a/charts/kubex-crds/templates/rightsizing.kubex.ai_podaffinitypolicies.yaml b/charts/kubex-crds/templates/rightsizing.kubex.ai_podaffinitypolicies.yaml new file mode 100644 index 0000000..8a0a363 --- /dev/null +++ b/charts/kubex-crds/templates/rightsizing.kubex.ai_podaffinitypolicies.yaml @@ -0,0 +1,247 @@ +--- +apiVersion: apiextensions.k8s.io/v1 +kind: CustomResourceDefinition +metadata: + annotations: + controller-gen.kubebuilder.io/version: v0.19.0 + name: podaffinitypolicies.rightsizing.kubex.ai +spec: + group: rightsizing.kubex.ai + names: + kind: PodAffinityPolicy + listKind: PodAffinityPolicyList + plural: podaffinitypolicies + singular: podaffinitypolicy + scope: Cluster + versions: + - name: v1alpha1 + schema: + openAPIV3Schema: + description: PodAffinityPolicy is the Schema for the podaffinitypolicies API. + properties: + apiVersion: + description: |- + APIVersion defines the versioned schema of this representation of an object. + Servers should convert recognized schemas to the latest internal value, and + may reject unrecognized values. + More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources + type: string + kind: + description: |- + Kind is a string value representing the REST resource this object represents. + Servers may infer this from the endpoint the client submits requests to. + Cannot be updated. + In CamelCase. + More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds + type: string + metadata: + type: object + spec: + description: spec defines the desired state of PodAffinityPolicy + properties: + affinity: + description: affinity describes the preferred node affinity to inject + at pod admission time. + properties: + nodes: + description: nodes lists hostname label values to prefer on replacement + pods. + items: + type: string + minItems: 1 + type: array + required: + - nodes + type: object + scope: + description: scope narrows the workloads and namespaces this policy + applies to. + properties: + labelSelector: + description: labelSelector limits the workload objects (e.g., + Deployments, CronJobs) this policy applies to. + properties: + matchExpressions: + description: matchExpressions is a list of label selector + requirements. The requirements are ANDed. + items: + description: |- + A label selector requirement is a selector that contains values, a key, and an operator that + relates the key and values. + properties: + key: + description: key is the label key that the selector + applies to. + type: string + operator: + description: |- + operator represents a key's relationship to a set of values. + Valid operators are In, NotIn, Exists and DoesNotExist. + type: string + values: + description: |- + values is an array of string values. If the operator is In or NotIn, + the values array must be non-empty. If the operator is Exists or DoesNotExist, + the values array must be empty. This array is replaced during a strategic + merge patch. + items: + type: string + type: array + x-kubernetes-list-type: atomic + required: + - key + - operator + type: object + type: array + x-kubernetes-list-type: atomic + matchLabels: + additionalProperties: + type: string + description: |- + matchLabels is a map of {key,value} pairs. A single {key,value} in the matchLabels + map is equivalent to an element of matchExpressions, whose key field is "key", the + operator is "In", and the values array contains only "value". The requirements are ANDed. + type: object + type: object + x-kubernetes-map-type: atomic + namespaceSelector: + description: namespaceSelector restricts the namespaces this policy + applies to. + properties: + operator: + description: operator determines how the listed values are + evaluated. + enum: + - In + - NotIn + type: string + values: + description: values contains the namespace name patterns to + match. + items: + type: string + minItems: 1 + type: array + required: + - operator + - values + type: object + workloadTypes: + default: + - Deployment + - StatefulSet + - CronJob + - Rollout + - Job + - AnalysisRun + - DaemonSet + description: workloadTypes limits the workload kinds this policy + applies to. When omitted, all supported workload types are targeted. + items: + description: WorkloadType enumerates the workload kinds a policy + can target. + enum: + - Deployment + - StatefulSet + - DaemonSet + - CronJob + - Rollout + - Job + - AnalysisRun + type: string + type: array + required: + - namespaceSelector + type: object + weight: + default: 0 + description: |- + weight determines which policy wins when multiple PodAffinityPolicy policies match. + Higher weights take precedence. When weights are equal, older policies win. + format: int32 + minimum: 0 + type: integer + required: + - affinity + - scope + type: object + status: + description: status defines the observed state of PodAffinityPolicy + properties: + conditions: + description: |- + conditions represent the current state of the StaticPolicy resource. + Each condition has a unique type and reflects the status of a specific aspect of the resource. + + Standard condition types include: + - "Available": the resource is fully functional + - "Progressing": the resource is being created or updated + - "Degraded": the resource failed to reach or maintain its desired state + + The status of each condition is one of True, False, or Unknown. + items: + description: Condition contains details for one aspect of the current + state of this API Resource. + properties: + lastTransitionTime: + description: |- + lastTransitionTime is the last time the condition transitioned from one status to another. + This should be when the underlying condition changed. If that is not known, then using the time when the API field changed is acceptable. + format: date-time + type: string + message: + description: |- + message is a human readable message indicating details about the transition. + This may be an empty string. + maxLength: 32768 + type: string + observedGeneration: + description: |- + observedGeneration represents the .metadata.generation that the condition was set based upon. + For instance, if .metadata.generation is currently 12, but the .status.conditions[x].observedGeneration is 9, the condition is out of date + with respect to the current state of the instance. + format: int64 + minimum: 0 + type: integer + reason: + description: |- + reason contains a programmatic identifier indicating the reason for the condition's last transition. + Producers of specific condition types may define expected values and meanings for this field, + and whether the values are considered a guaranteed API. + The value should be a CamelCase string. + This field may not be empty. + maxLength: 1024 + minLength: 1 + pattern: ^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$ + type: string + status: + description: status of the condition, one of True, False, Unknown. + enum: + - "True" + - "False" + - Unknown + type: string + type: + description: type of condition in CamelCase or in foo.example.com/CamelCase. + maxLength: 316 + pattern: ^([a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*/)?(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])$ + type: string + required: + - lastTransitionTime + - message + - reason + - status + - type + type: object + type: array + x-kubernetes-list-map-keys: + - type + x-kubernetes-list-type: map + type: object + required: + - spec + type: object + served: true + storage: true + subresources: + status: {}