[CASCL-1304] kubectl-datadog: enrich dd-cluster-info ConfigMap#2980
Draft
[CASCL-1304] kubectl-datadog: enrich dd-cluster-info ConfigMap#2980
kubectl-datadog: enrich dd-cluster-info ConfigMap#2980Conversation
The dd-cluster-info ConfigMap (introduced by #2945) now records: - the running Karpenter installation (version, namespace, ownership) under a new `autoscaling` parent that also groups the existing clusterAutoscaler entry and a new eksAutoMode entry, - a `managedByDatadog` flag per node-management entity (Fargate profile, Karpenter NodePool), so a future migration tool can distinguish Datadog-managed entities to keep from legacy ones to drain. Detection helpers `FindKarpenterInstallation` and `IsEKSAutoModeEnabled` move from `install/guess/` to new `common/karpenter/` and `common/eksautomode/` packages so the clusterinfo classifier can reuse them. A generic `commonk8s.FindFirstDeployment` factors out the shared pager+predicate scan, and `commonk8s.ExtractDeploymentVersion` factors out the controller-image-tag → label fallback used by both detectors. Karpenter NodePool ownership uses the broader `autoscaling.datadoghq.com/created` label only (vs. uninstall's AND-pair with `app.kubernetes.io/managed-by: kubectl-datadog`) so NodePools managed by the Datadog cluster agent are also preserved by the migration tool. Datadog-managed NodePools with no nodes yet (typical right after install) are seeded into the snapshot with an empty Nodes list so the migration tool sees the destination NodePools exist. Fargate profile ownership reads tags via EKS DescribeFargateProfile; the `managed-by: kubectl-datadog` tag is propagated automatically from the CloudFormation stack tags, so no infrastructure change is needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2980 +/- ##
==========================================
+ Coverage 41.39% 41.42% +0.02%
==========================================
Files 331 332 +1
Lines 28911 28984 +73
==========================================
+ Hits 11969 12007 +38
- Misses 16086 16118 +32
- Partials 856 859 +3
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
This comment has been minimized.
This comment has been minimized.
kubectl-datadog: enrich dd-cluster-info ConfigMap
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Enriches the
dd-cluster-infoConfigMap (introduced by #2945) so a future migration tool can:managedByDatadogflag.autoscalingparent that also groups the existingclusterAutoscalerentry and a neweksAutoModeentry.As a side-effect, two detection helpers (
FindKarpenterInstallation,IsEKSAutoModeEnabled) move frominstall/guess/to newcommon/karpenter/andcommon/eksautomode/packages so theclusterinfoclassifier can reuse them. A genericcommonk8s.FindFirstDeploymentfactors out the shared pager+predicate scan, andcommonk8s.ExtractDeploymentVersionfactors out the controller-image-tag → label fallback used by both the Karpenter and Cluster Autoscaler detectors.Motivation
Follow-up to #2945 (CASCL-1304). The original snapshot only captured nodes grouped by their owning manager; the future migration tool also needs to tell Datadog-managed managers (to keep) from legacy ones (to drain), to know whether Karpenter is already running, and to know whether EKS auto-mode is active so it can short-circuit when there is no migration to drive.
Additional Notes
dd-cluster-infoexists yet (grep -r "dd-cluster-info\|ConfigMapDataKey" --include='*.go'returns only the writer).APIVersionstays atv1.autoscaling.datadoghq.com/createdlabel alone — notuninstall.go's AND-pair withapp.kubernetes.io/managed-by: kubectl-datadog. The cluster agent creates NodePools with only thecreatedlabel, and the migration tool must preserve them too. This divergence is deliberate.EKS.DescribeFargateProfile. The expectedmanaged-by: kubectl-datadogtag is propagated automatically from the CloudFormation stack tags written bycommon/aws/cloudformation.go, so no infrastructure change is needed.nodeslist so the migration tool sees the destination NodePools exist.DescribeFargateProfile, NodePool list, Discovery for auto-mode) is tolerated — transient errors / missing CRDs log a warning and leave entries unflagged rather than failing the snapshot. The call site (recordClusterInfo) was already best-effort before this PR.Minimum Agent Versions
Describe your test plan
Automated coverage added in this PR:
TestClassify_KarpenterNodePoolOwnership: kubectl-datadog (both labels) + cluster agent (createdlabel only) + Datadog NodePool with no nodes yet + foreign NodePool.TestClassify_KarpenterNodePoolOwnership_NoCRD: tolerant ofmeta.IsNoMatchErrorwhen the Karpenter CRD is not installed.TestClassify_FargateProfileOwnership/TestClassify_FargateProfileOwnership_DescribeError: tag-based detection + AWS API error fallback.TestClassify_KarpenterDetection: version extraction from controller image tag,ManagedByDatadog/InstallerVersionfrom sentinel labels.TestClassify_EKSAutoMode: discovery API exposesnodeclasses→Enabled: true.TestPersist_YAMLShape: pins lowerCamelCase wire keys against thegopkg.in/yaml.v3lower-case-by-default footgun.TestFindFirstDeployment_*: covers the new generic helper.Manual test plan on a sandbox EKS cluster:
kubectl datadog autoscaling cluster installsucceeds.kubectl get cm -n dd-karpenter dd-cluster-info -o yamlcontains:autoscaling.karpenter.{present, version, managedByDatadog, installerVersion}autoscaling.eksAutoMode.enablednodeManagement.fargate."dd-karpenter-<cluster>".managedByDatadog: truenodeManagement.karpenterwithmanagedByDatadog: true, even with no node landed on it yet.managedByDatadog: true.Checklist
enhancement,refactoringqa/skip-qalabel