[CASCL-1304] `kubectl-datadog`: enrich `dd-cluster-info` ConfigMap by L3n41c · Pull Request #2980 · DataDog/datadog-operator

L3n41c · 2026-05-06T05:18:55Z

What does this PR do?

Enriches the dd-cluster-info ConfigMap (introduced by #2945) so a future migration tool can:

Distinguish Datadog-managed node-management entities (Fargate profiles, Karpenter NodePools) from legacy ones to drain — via a per-entity managedByDatadog flag.
See the running Karpenter installation (version, namespace, ownership) under a new autoscaling parent that also groups the existing clusterAutoscaler entry and a new eksAutoMode entry.

As a side-effect, two detection helpers (FindKarpenterInstallation, IsEKSAutoModeEnabled) move from install/guess/ to new common/karpenter/ and common/eksautomode/ packages so the clusterinfo classifier can reuse them. A generic commonk8s.FindFirstDeployment factors out the shared pager+predicate scan, and commonk8s.ExtractDeploymentVersion factors out the controller-image-tag → label fallback used by both the Karpenter and Cluster Autoscaler detectors.

Motivation

Follow-up to #2945 (CASCL-1304). The original snapshot only captured nodes grouped by their owning manager; the future migration tool also needs to tell Datadog-managed managers (to keep) from legacy ones (to drain), to know whether Karpenter is already running, and to know whether EKS auto-mode is active so it can short-circuit when there is no migration to drive.

Additional Notes

Schema change is breaking, on purpose: no consumer of dd-cluster-info exists yet (grep -r "dd-cluster-info\|ConfigMapDataKey" --include='*.go' returns only the writer). APIVersion stays at v1.
NodePool ownership detection uses the broader autoscaling.datadoghq.com/created label alone — not uninstall.go's AND-pair with app.kubernetes.io/managed-by: kubectl-datadog. The cluster agent creates NodePools with only the created label, and the migration tool must preserve them too. This divergence is deliberate.
Fargate profile ownership detection reads tags via EKS.DescribeFargateProfile. The expected managed-by: kubectl-datadog tag is propagated automatically from the CloudFormation stack tags written by common/aws/cloudformation.go, so no infrastructure change is needed.
Datadog-managed NodePools with no nodes yet (typical right after install) are seeded into the snapshot with an empty nodes list so the migration tool sees the destination NodePools exist.
Best-effort detection: every new external API call (EKS DescribeFargateProfile, NodePool list, Discovery for auto-mode) is tolerated — transient errors / missing CRDs log a warning and leave entries unflagged rather than failing the snapshot. The call site (recordClusterInfo) was already best-effort before this PR.

Minimum Agent Versions

Agent: N/A (kubectl-datadog plugin only)
Cluster Agent: N/A (kubectl-datadog plugin only)

Describe your test plan

Automated coverage added in this PR:

TestClassify_KarpenterNodePoolOwnership: kubectl-datadog (both labels) + cluster agent (created label only) + Datadog NodePool with no nodes yet + foreign NodePool.
TestClassify_KarpenterNodePoolOwnership_NoCRD: tolerant of meta.IsNoMatchError when the Karpenter CRD is not installed.
TestClassify_FargateProfileOwnership / TestClassify_FargateProfileOwnership_DescribeError: tag-based detection + AWS API error fallback.
TestClassify_KarpenterDetection: version extraction from controller image tag, ManagedByDatadog/InstallerVersion from sentinel labels.
TestClassify_EKSAutoMode: discovery API exposes nodeclasses → Enabled: true.
TestPersist_YAMLShape: pins lowerCamelCase wire keys against the gopkg.in/yaml.v3 lower-case-by-default footgun.
TestFindFirstDeployment_*: covers the new generic helper.

Manual test plan on a sandbox EKS cluster:

kubectl datadog autoscaling cluster install succeeds.
kubectl get cm -n dd-karpenter dd-cluster-info -o yaml contains:
- autoscaling.karpenter.{present, version, managedByDatadog, installerVersion}
- autoscaling.eksAutoMode.enabled
- nodeManagement.fargate."dd-karpenter-<cluster>".managedByDatadog: true
At least one NodePool created by kubectl-datadog appears under nodeManagement.karpenter with managedByDatadog: true, even with no node landed on it yet.
A hand-created foreign NodePool (no Datadog labels) does NOT carry managedByDatadog: true.

Checklist

PR has at least one valid label: enhancement, refactoring
PR has a milestone or the qa/skip-qa label
All commits are signed

The dd-cluster-info ConfigMap (introduced by #2945) now records: - the running Karpenter installation (version, namespace, ownership) under a new `autoscaling` parent that also groups the existing clusterAutoscaler entry and a new eksAutoMode entry, - a `managedByDatadog` flag per node-management entity (Fargate profile, Karpenter NodePool), so a future migration tool can distinguish Datadog-managed entities to keep from legacy ones to drain. Detection helpers `FindKarpenterInstallation` and `IsEKSAutoModeEnabled` move from `install/guess/` to new `common/karpenter/` and `common/eksautomode/` packages so the clusterinfo classifier can reuse them. A generic `commonk8s.FindFirstDeployment` factors out the shared pager+predicate scan, and `commonk8s.ExtractDeploymentVersion` factors out the controller-image-tag → label fallback used by both detectors. Karpenter NodePool ownership uses the broader `autoscaling.datadoghq.com/created` label only (vs. uninstall's AND-pair with `app.kubernetes.io/managed-by: kubectl-datadog`) so NodePools managed by the Datadog cluster agent are also preserved by the migration tool. Datadog-managed NodePools with no nodes yet (typical right after install) are seeded into the snapshot with an empty Nodes list so the migration tool sees the destination NodePools exist. Fargate profile ownership reads tags via EKS DescribeFargateProfile; the `managed-by: kubectl-datadog` tag is propagated automatically from the CloudFormation stack tags, so no infrastructure change is needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

codecov-commenter · 2026-05-06T05:30:13Z

Codecov Report

❌ Patch coverage is 72.15190% with 44 lines in your changes missing coverage. Please review.
✅ Project coverage is 41.42%. Comparing base (d1d2b65) to head (992bd28).
⚠️ Report is 2 commits behind head on main.

Files with missing lines	Patch %	Lines
...tadog/autoscaling/cluster/common/k8s/deployment.go	45.45%	18 Missing ⚠️
...autoscaling/cluster/common/clusterinfo/classify.go	84.94%	10 Missing and 4 partials ⚠️
...bectl-datadog/autoscaling/cluster/install/steps.go	20.00%	12 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2980      +/-   ##
==========================================
+ Coverage   41.39%   41.42%   +0.02%     
==========================================
  Files         331      332       +1     
  Lines       28911    28984      +73     
==========================================
+ Hits        11969    12007      +38     
- Misses      16086    16118      +32     
- Partials      856      859       +3

Flag	Coverage Δ
unittests	`41.42% <72.15%> (+0.02%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
...oscaling/cluster/common/eksautomode/eksautomode.go	`88.88% <100.00%> (ø)`
.../autoscaling/cluster/common/karpenter/karpenter.go	`92.85% <100.00%> (ø)`
...bectl-datadog/autoscaling/cluster/install/steps.go	`19.51% <20.00%> (-0.49%)`	⬇️
...autoscaling/cluster/common/clusterinfo/classify.go	`86.95% <84.94%> (-3.96%)`	⬇️
...tadog/autoscaling/cluster/common/k8s/deployment.go	`45.45% <45.45%> (ø)`

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d1d2b65...992bd28. Read the comment docs.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

L3n41c added enhancement New feature or request refactoring labels May 6, 2026

github-actions Bot added team/container-platform team/container-autoscaling labels May 6, 2026

L3n41c added the qa/skip-qa label May 6, 2026

This comment has been minimized.

Sign in to view

L3n41c changed the title ~~[CASCL-1304] kubectl-datadog: enrich dd-cluster-info ConfigMap~~ [CASCL-1304] kubectl-datadog: enrich dd-cluster-info ConfigMap May 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CASCL-1304] `kubectl-datadog`: enrich `dd-cluster-info` ConfigMap#2980

[CASCL-1304] `kubectl-datadog`: enrich `dd-cluster-info` ConfigMap#2980
L3n41c wants to merge 1 commit intomainfrom
lenaic/CASCL-1304-enrich-dd-cluster-info

L3n41c commented May 6, 2026

Uh oh!

codecov-commenter commented May 6, 2026 •

edited

Loading

Uh oh!

This comment has been minimized.

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

L3n41c commented May 6, 2026

What does this PR do?

Motivation

Additional Notes

Minimum Agent Versions

Describe your test plan

Checklist

Uh oh!

codecov-commenter commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

This comment has been minimized.

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov-commenter commented May 6, 2026 •

edited

Loading