Experimental multicluster operator prototype.
Helm chart rollout is now centrally configured by process flags instead of per-Tenant spec fields. All Tenants share the same chart (repository, name, version) and the operator installs/upgrades that chart into each Tenant namespace on every engaged cluster.
The multicluster entrypoint (RunMulticlusterExample) supports these flags:
--chart-repo Central Helm chart repository URL (default: https://charts.jetstack.io)
--chart-name Central Helm chart name (default: cert-manager)
--chart-version Central Helm chart version (default: 1.19.1)
--namespace Namespace containing kubeconfig secrets (default: default)
--kubeconfig-label Label selecting kubeconfig secrets (default: sigs.k8s.io/multicluster-runtime-kubeconfig)
--kubeconfig-key Data key for kubeconfig content in secret (default: kubeconfig)
--watch-kubeconfig Path to kubeconfig for the watch/home cluster (optional)
--watch-context Kubeconfig context name for the watch cluster (optional)
--ensure-watch-crds Ensure required CRDs exist on the watch cluster at startup (default: false)
--watch-kubeconfig-secret Name of Secret containing kubeconfig for the watch cluster (preferred)
--watch-kubeconfig-secret-namespace Namespace of the Secret (defaults to --namespace if empty)
--watch-kubeconfig-secret-key Data key in the Secret containing kubeconfig bytes (default: kubeconfig)
Example run (local):
go run ./cmd/multicluster \
--chart-repo=https://charts.jetstack.io \
--chart-name=cert-manager \
--chart-version=1.19.1 \
--namespace=platform-systemKrypton Operator
To roll out a different chart or version across all Tenants, restart (or upgrade) the operator with new flag values. The controller computes a fingerprint from repo|name|version and only performs a Helm upgrade if the fingerprint changed.
Per-Tenant Helm values are currently disabled; values are an empty map. To introduce centralized values, add a new flag (e.g. --chart-values-file) and load + merge it before install/upgrade. Per-Tenant overrides would require a design update (e.g. referencing a ConfigMap or reintroducing a controlled subset of spec fields).
Older versions used spec.chart within the Tenant CRD. That field has been removed. Existing Tenant objects with a chart key in their spec must be deleted or re-applied without the field after installing the updated CRD:
This project is open to feature requests/suggestions, bug reports etc. via GitHub issues. Contribution and feedback are encouraged and always welcome. For more information about how to contribute, the project structure, as well as additional contribution information, see our Contribution Guidelines.
If you find any bug that may be a security problem, please follow our instructions at in our security policy on how to report it. Please do not create GitHub issues for security-related doubts or problems.
kubectl delete tenant -A --all # if safe; or selectively recreate
kubectl apply -f config/crd/bases/mesh.openkcm.io_tenants.yamlThen create new Tenants with only spec.clusterRef (optional). Chart selection is now purely an operator deployment concern.
Set ALLOW_CHART_SKIP=true in the operator environment to treat non-fatal chart load errors (e.g. temporary repo outage) as skippable, allowing the Tenant phase to progress to Ready if other conditions are satisfied.
Additional env vars for watch cluster selection:
WATCH_KUBECONFIG(orWATCH_CLUSTER_KUBECONFIG): file path to the kubeconfig used for the watch/home cluster.WATCH_CONTEXT: optional context; defaults to the kubeconfigcurrent-context.ENSURE_WATCH_CRDS: set totrueto auto-ensure the KryptonDeployment CRD on the watch cluster.WATCH_KUBECONFIG_SECRET: Secret name containing the kubeconfig for the watch cluster.WATCH_KUBECONFIG_SECRET_NAMESPACE: Secret namespace (defaults to discovery--namespace).WATCH_KUBECONFIG_SECRET_KEY: Data key in the Secret (defaultkubeconfig).
Example (Helm chart): set values to reference a Secret the operator can read:
watchCluster:
secretName: "home-kubeconfig"
secretNamespace: "krypton-operator"
secretKey: "kubeconfig"
ensureCrds: true
Lifecycle Events emitted per Tenant per cluster:
- HelmInstallStart / HelmInstalled
- HelmUpgradeStart / HelmUpgraded
- HelmInstallFailed / HelmUpgradeFailed
- HelmSkip (fingerprint unchanged)
- ChartNotLoaded / ChartVersionNotFound / ChartVersionInvalid
- PhaseSet (aggregated phase transitions)
Status Conditions use cluster-scoped types (e.g. ClusterReady/<cluster> / ClusterError/<cluster> / ClusterProgress/<cluster>).
An annotation mesh.openkcm.io/fingerprint-<cluster> is set on the Tenant after successful install/upgrade. Update logic currently performs a direct object Update; future improvement will switch to a PATCH with retries and emit FingerprintUpdated / FingerprintUpdateFailed events.
- No install occurring? Verify the chart flags and that the repo is reachable from the operator pod.
- Continuous ChartNotLoaded warnings? Check network egress, repo URL correctness, or temporary repository outage.
- VersionNotFound / VersionInvalid events? Confirm the semantic version exists in the repository and is valid (SemVer compliant).
- Missing events entirely? Ensure RBAC includes create permissions for events (the bundled Role does).
See internal/multicluster/example.go for the reconciliation logic implementing these behaviors.
Multi-cluster Kubernetes operator that deploys a Helm chart for each Tenant custom resource present on a cluster. The operator ensures the tenant namespace exists and performs Helm install / upgrade with idempotent fingerprint skipping. Tenants are now stored directly on the cluster they target (no shadow propagation model).
- Kubeconfig Provider: Discovers clusters via labeled Secrets (
sigs.k8s.io/multicluster-runtime-kubeconfig). A self-cluster secret is auto-synthesized on startup (strategy C: validate, embed certs/keys, fallback to full file). - Multicluster Manager: One controller-runtime manager orchestrates dynamic cluster engagement; a single controller reconciles
Tenantobjects per cluster. - Reconcile Steps (per cluster-local Tenant):
- Ensure Tenant CRD present on that cluster (embedded manifest applied on demand).
- Fetch the local Tenant object.
- Optional cluster targeting: if
spec.clusterRef.secretNameset and does not match cluster name, reconciliation is skipped. - Ensure tenant namespace exists (tenant name becomes the namespace).
- Resolve Helm chart (repo URL + name + version) and values map.
- Compute fingerprint (sha256 of repo|name|version|sorted key=value pairs) per cluster; skip Helm action if unchanged.
- Install or upgrade release (
tenant-<name>-<cluster>), update annotationmesh.openkcm.io/fingerprint-<cluster>. - Upsert per-cluster status conditions and aggregate top-level
status.phase.
Defined in api/v1alpha1/tenant_types.go (embedded YAML in internal/multicluster/mesh.openkcm.io_tenants.yaml).
Spec example:
spec:
clusterRef:
secretName: remote-cluster-kubeconfigStatus:
status:
phase: Ready|Pending|Error
conditions:
- type: ClusterReady/self-cluster
status: "True"
reason: Installed|Upgraded|NoChange
- type: ClusterError/<cluster>
status: "True"
reason: InstallFailed|UpgradeFailed|ChartNotLoadedFingerprint stored per cluster in annotation: mesh.openkcm.io/fingerprint-<cluster>.
make tidy
make build
make run # uses current KUBECONFIG as home clusterCreate a kubeconfig Secret with the discovery label:
kubectl -n default create secret generic demo-remote-1 \
--from-file=kubeconfig=/path/to/remote.kubeconfig \
--label sigs.k8s.io/multicluster-runtime-kubeconfig=truekubectl apply -f examples/tenant-acme.yaml
kubectl get tenant acme -o yamltenant-<tenantName>-<clusterName> ensures uniqueness across clusters.
Avoids unnecessary Helm upgrades. Changing any chart field or values key/value mutates the sha256 hash, triggering an upgrade.
- Load user kubeconfig file ($KUBECONFIG first path or $HOME/.kube/config).
- Inline referenced cert/key/CA files into data fields.
- Reduce to current context & validate by constructing rest.Config and probing /version.
- Fallback to full file if validation fails; otherwise synthesize minimal config when no file present.
- Error if any
ClusterError/*condition true. - Ready if at least one
ClusterReady/*true and no errors. - Pending otherwise.
| Symptom | Cause | Resolution |
|---|---|---|
| chart not loaded; skipping helm action | Repo unreachable or invalid chart/version | Check repo URL & version; network access. |
| helm install failed (auth) | Remote cluster credentials invalid | Recreate kubeconfig Secret with valid token or cert/key. |
| CRD ensure failed repeatedly | Insufficient RBAC on remote cluster | Grant create/update on CRDs. |
| Fingerprint not updating | Annotation update failed | Ensure Tenant update RBAC; inspect controller logs. |
A reproducible two-cluster test harness lives in hack/e2e-kind.sh and is wired to make e2e-kind.
Flow:
- Creates two kind clusters:
homeandremote. - Installs the Tenant CRD on both clusters.
- Creates a kubeconfig Secret for the remote cluster in the home cluster (for discovery by the multicluster manager).
- Starts the operator against the home cluster (manager discovers both clusters).
- Applies a
Tenantdirectly on the remote cluster (no shadow object creation required). - Operator on remote cluster ensures workspace namespace and performs Helm install.
- Tenant status on remote cluster transitions to
Ready. - Script validates readiness and tears everything down.
Run it:
make e2e-kindExpected log snippets:
workspace workspace namespace created
helm install success
tenant phase is Ready on remote cluster
e2e passed
The e2e harness starts the operator as a background process and now enforces a clean shutdown:
- Operator PID recorded in
/tmp/krypton-operator.pid. - On normal completion or trap (INT/TERM/EXIT) the script sends SIGTERM, waits up to ~5s, then SIGKILL if still running.
make stoptarget is available for manual cleanup if you start withmake run-pid.
Manual usage:
make run-pid # starts operator, writes PID file
make stop # terminates operator (TERM + optional KILL)This prevents orphaned processes that would otherwise keep ports bound and interfere with subsequent test runs.
Troubleshooting:
| Symptom | Cause | Resolution |
|---|---|---|
| port bind error during kind remote creation | Docker reused API server port quickly | Re-run; script includes a small sleep to mitigate race |
| engage manager error (kind not registered) | Missing scheme in cluster options | Ensure ClusterOptions: []cluster.Option{func(o *cluster.Options){o.Scheme = scheme}} present |
| shadow tenant invalid | Shadow spec omitted required fields | Regenerate shadow with full spec (already implemented) |
- Finalizer for uninstall.
- (DONE) Removed shadow propagation: Tenants now reside locally on their target cluster.
- Tests (unit & extended e2e).
- Metrics (ready/error counts per cluster).
- Controller-gen integration to auto-generate CRDs.
Apache 2.0 (placeholder)
Multi-cluster operator that installs a Helm chart (e.g., cert-manager) into a tenant-specific namespace on a remote customer cluster when a Tenant custom resource is created in the home cluster.
MVP scaffold. Multi-cluster runtime integration and full Helm implementation marked as TODO.
Deprecated in current architecture; Tenant types removed from codebase.
make tidy
make build
make run # runs against your current KUBECONFIG/home clusterApply example manifests:
kubectl apply -f examples/remote-kubeconfig-secret.yaml
kubectl apply -f examples/tenant-acme.yamlMust contain a key kubeconfig with the remote cluster kubeconfig content. See helper script hack/dev-remote-secret.sh.
Release name format (proposed): ced-<deploymentName>.
- Implement Helm install/upgrade logic inside reconcile.
- Integrate multicluster-runtime once import path confirmed.
- Add finalizer for uninstall.
- Status conditions reflecting remote deployment state.
- Controller-gen + CRD generation.
- Proper unit/e2e tests.
Module path: github.com/openkcm/krypton-operator.
# (placeholder instructions)
kind create cluster --name home
kind create cluster --name remote
# Export remote kubeconfig to file
kind get kubeconfig --name remote > /tmp/remote.kubeconfig
./hack/dev-remote-secret.sh /tmp/remote.kubeconfig krypton-operator acme-remote-kubeconfig
kubectl apply -f examples/tenant-acme.yamlCopyright (20xx-)20xx SAP SE or an SAP affiliate company and OpenKCM contributors. Please see our LICENSE for copyright and license information. Detailed information including third-party components and their licensing/copyright information is available via the REUSE tool.
To facilitate a nice environment for all, check out our Code of Conduct.
