Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions api/v1alpha1/cluster_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -97,8 +97,12 @@ type ClusterTLSConfig struct {

// ClusterStatus defines the observed state of Cluster
type ClusterStatus struct {
// The conditions of the cluster
Conditions []metav1.Condition `json:"conditions,omitempty"`
// The number of ready replicas
ReadyReplicas int32 `json:"readyReplicas"`
// The selector for the cluster statefulset
Selector string `json:"selector"`
// The number of pipelines referencing this cluster
PipelinesCount int32 `json:"pipelinesCount"`
// The number of targets referenced by the pipelines
Expand All @@ -109,13 +113,12 @@ type ClusterStatus struct {
InputsCount int32 `json:"inputsCount"`
// The number of outputs referenced by the pipelines
OutputsCount int32 `json:"outputsCount"`
// The conditions of the cluster
Conditions []metav1.Condition `json:"conditions,omitempty"`
}

// +kubebuilder:object:root=true
// +kubebuilder:storageversion
// +kubebuilder:subresource:status
// +kubebuilder:subresource:scale:specpath=.spec.replicas,statuspath=.status.readyReplicas,selectorpath=.status.selector
// +kubebuilder:printcolumn:name="Image",type=string,JSONPath=`.spec.image`
// +kubebuilder:printcolumn:name="Replicas",type=integer,JSONPath=`.spec.replicas`
// +kubebuilder:printcolumn:name="Ready",type=integer,JSONPath=`.status.readyReplicas`
Expand Down
8 changes: 8 additions & 0 deletions config/crd/bases/operator.gnmic.dev_clusters.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -475,6 +475,9 @@ spec:
description: The number of ready replicas
format: int32
type: integer
selector:
description: The selector for the cluster statefulset
type: string
subscriptionsCount:
description: The number of subscriptions referenced by the pipelines
format: int32
Expand All @@ -488,11 +491,16 @@ spec:
- outputsCount
- pipelinesCount
- readyReplicas
- selector
- subscriptionsCount
- targetsCount
type: object
type: object
served: true
storage: true
subresources:
scale:
labelSelectorPath: .status.selector
specReplicasPath: .spec.replicas
statusReplicasPath: .status.readyReplicas
status: {}
108 changes: 87 additions & 21 deletions docs/content/docs/advanced/scaling.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: "Scaling"
linkTitle: "Scaling"
weight: 1
weight: 2
description: >
Scaling gNMIc clusters horizontally
---
Expand Down Expand Up @@ -75,7 +75,7 @@ Estimate based on:

### Use Resource Limits

Ensure pods have appropriate resources:
Ensure clusters (pods) have appropriate resources:

```yaml
spec:
Expand Down Expand Up @@ -103,34 +103,23 @@ container_memory_usage_bytes{pod=~"gnmic-.*"}
gnmic_target_status{cluster="my-cluster"}
```

### Scale Gradually
## Horizontal Pod Autoscaler

For large changes, scale gradually:
The operator's Cluster resource supports the `scale` subresource, allowing you to enable automatic scaling using the Horizontal Pod Autoscaler (HPA).

```bash
# Instead of 3 → 10
kubectl patch cluster my-cluster -p '{"spec":{"replicas":5}}'
# Wait for stabilization
kubectl patch cluster my-cluster -p '{"spec":{"replicas":7}}'
# Wait for stabilization
kubectl patch cluster my-cluster -p '{"spec":{"replicas":10}}'
```

## Horizontal Pod Autoscaler (Comming Soon)

You can use HPA for automatic scaling:
To set up autoscaling, create an HPA resource that targets the Cluster resource. Specify the desired minimum and maximum number of replicas, as well as the metrics that will determine when scaling occurs:

```yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: gnmic-cluster-hpa
name: gnmic-c1-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: StatefulSet
name: gnmic-my-cluster
minReplicas: 2
apiVersion: operator.gnmic.dev/v1alpha1
kind: Cluster
name: c1
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
Expand All @@ -141,6 +130,83 @@ spec:
averageUtilization: 70
```

> **Note:** You must install the Kubernetes metrics server to enable HPA based on CPU or Memory:
>
> ```shell
> kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
> ```

### Autoscaling based on custom resources

gNMIc pods provide various Prometheus metrics that can be leveraged by an HPA resource for autoscaling.

One common use case is to scale based on the number of targets assigned to each Pod.
The gNMIc pods export metrics like:

```
gnmic_target_up{name="default/leaf1"} 0
gnmic_target_up{name="default/leaf2"} 0
gnmic_target_up{name="default/spine1"} 1
```

Here, a value of `1` indicates that the target is present, while `0` denotes it is absent.

With [Prometheus Adapter](https://github.com/kubernetes-sigs/prometheus-adapter), this metric can be made available as `targets_per_pod{cluster="c1", pod="gnmic-c1-0"}` = 1.
You can use the following promQL to aggregate these into a “targets per pod” metric: `sum(gnmic_target_up == 1) by (namespace, pod)`.

> You can assign `namespace` and `pod` labels to metrics using scrape configurations or relabeling.

```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-adapter-rules
namespace: monitoring
data:
config.yaml: |
rules:
default: false
custom:
- seriesQuery: 'gnmic_target_up{namespace!="",pod!=""}'
resources:
overrides:
namespace:
resource: namespace
pod:
resource: pod
name:
matches: "^gnmic_target_up$"
as: "gnmic_targets_present"
metricsQuery: |
sum(gnmic_target_up{<<.LabelMatchers>>} == 1) by (namespace, pod)
```

The corresponding HPA resource would look like this:

In other words: Scale **Cluster** `c1` to a max of `10` replicas if the average number of targets present in the current pods is above `30`.

```yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: gnmic-c1-hpa
spec:
scaleTargetRef:
apiVersion: operator.gnmic.dev/v1alpha1
kind: Cluster
name: c1
minReplicas: 1
maxReplicas: 10
metrics:
- type: Pods
pods:
metric:
name: gnmic_targets_present
target:
type: AverageValue
averageValue: "30"
```

## Considerations

### Output Connections
Expand Down
61 changes: 6 additions & 55 deletions docs/content/docs/advanced/target-distribution.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,15 @@
---
title: "Target Distribution"
linkTitle: "Target Distribution"
weight: 2
weight: 1
description: >
How targets are distributed across pods
---

The gNMIc Operator uses a sophisticated algorithm to distribute targets across pods. This page explains the algorithm and its properties.
The gNMIc Operator uses a simple algorithm to distribute targets across pods.
More placement/distribution strategies will be implemented in the future.

This page explains the algorithm and its properties.

## Algorithm: Bounded Load Rendezvous Hashing

Expand Down Expand Up @@ -40,13 +43,6 @@ For each target:
2. Sort pods by score (highest first)
3. Assign to highest-scoring pod that has capacity

```
Target: "router1"
Scores: pod0=892341, pod1=234567, pod2=567890
Order: pod0, pod2, pod1
pod0 has capacity → assign to pod0
```

### Step 4: Track Load

After each assignment, increment the pod's load count. When a pod reaches capacity, it's skipped for future assignments.
Expand Down Expand Up @@ -79,7 +75,7 @@ With capacity = ceil(n/p), no pod can have more than `capacity` targets:

When scaling:

**Adding a pod**: Only targets that score highest for the new pod AND are on an over-capacity pod will move. Typically ~1/(N+1) targets move.
**Adding a pod**: Only targets that score highest for the new pod will move.

**Removing a pod**: Only targets on the removed pod redistribute. Targets on remaining pods stay put.

Expand Down Expand Up @@ -120,48 +116,3 @@ Targets moved: 3 out of 10 (30%)
| Rendezvous hash | High | Variable | Medium |
| **Bounded load rendezvous** | **Good** | **Good** | **Medium** |

## Implementation Details

The distribution logic is in `internal/gnmic/distribute.go`:

```go
func DistributeTargets(plan *ApplyPlan, podIndex, numPods int) *ApplyPlan {
// Get assignments using bounded rendezvous hashing
assignments := boundedRendezvousAssign(plan.Targets, numPods)

// Filter to only targets for this pod
for targetNN, assignedPod := range assignments {
if assignedPod == podIndex {
distributed.Targets[targetNN] = plan.Targets[targetNN]
}
}
return distributed
}
```

## Debugging Distribution

To see how targets are distributed:

```bash
# Check targets per pod
for i in 0 1 2; do
echo "Pod $i:"
kubectl exec gnmic-my-cluster-$i -- curl -s localhost:7890/api/v1/config/targets | jq 'keys'
done
```

Or check the operator logs:

```bash
kubectl logs -n gnmic-operator-system deployment/gnmic-operator-controller-manager | grep "config applied"
```

Output shows target count per pod:

```
config applied to pod pod=0 targets=34
config applied to pod pod=1 targets=33
config applied to pod pod=2 targets=33
```

Loading