Conversation
… + shared subpackages
Prepare ground for the upcoming `kubectl datadog autoscaling cluster
update` subcommand by extracting the convergence logic that install
will share with update into a new `apply` package, and by elevating
`guess` and `k8s` to siblings of the cobra commands now that several
callers depend on them.
* `cluster/install/{guess,k8s}/` → `cluster/{guess,k8s}/`. install,
uninstall and the new apply package all import them; keeping them
under install/ would create awkward `…/install/guess` import paths
in the unrelated callers.
* `cluster/install/steps.go` (and its test) → `cluster/apply/run.go`,
reshaped into a package-level `Run(ctx, streams, configFlags,
clientset, RunOptions) error` plus exported helpers
(`KarpenterStackName`, `DDKarpenterStackName`, `InstallModeTagKey`,
`DetectedInstallMode`). `RunOptions` replaces what used to be eight
package-level globals mutated by cobra flag binding. Display
helpers now take `genericclioptions.IOStreams` instead of
`*cobra.Command`.
* `cluster/install/{create_karpenter_resources,inference_method}.go`
(with their tests) → `cluster/apply/`. Both flags are bound by
install today and by update tomorrow, so their pflag.Value methods
belong where both callers can reach them.
* `cluster/install/install_mode.go` is split: the bare type plus
`DetectedInstallMode` go to `cluster/apply/install_mode.go`; the
pflag.Value wrapper stays in `cluster/install/install_mode.go` as
a package-private type around `apply.InstallMode`. Only install
binds `--install-mode`, so the wrapper has a single consumer.
* `cluster/install/install.go` becomes a thin cobra wrapper: flags
bind to fields on the local `options` struct (no more globals),
`RunE` builds an `apply.RunOptions{ActionLabel: "Installing"}` and
delegates to `apply.Run`.
* `cluster/uninstall/uninstall.go` migrates inline stack-name
concatenations to `apply.KarpenterStackName` and
`apply.DDKarpenterStackName`, eliminating the duplication.
* `cluster/common/clients/clients.go` gains `ResolveClusterName`,
collapsing the explicit-flag-or-kubeconfig fallback that install,
update and uninstall would otherwise repeat.
Behaviour preserved end to end; tests cover the moved code.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…cluster` Refresh a previously-installed kubectl-datadog Karpenter deployment without forcing the user to repeat the immutable flags they passed to `install`. `update` adds two values on top of an idempotent `install` re-run: 1. **Safety guard**: refuses if no kubectl-datadog Karpenter is present, or if a Karpenter installed by another tool is found. We never mutate an installation we did not create. The check runs before any AWS API call so a flaky AWS does not mask the "run install first" signal. 2. **Auto-detection of immutable parameters**: reads the dd-karpenter CFN stack and pulls KarpenterNamespace (parameter), InstallMode (`install-mode` tag) and FargateSubnets (parameter) from it. None of those are exposed as flags on `update` — they are CFN CreateOnly properties, so accepting them and then rejecting on mismatch would be UX noise. Changing one is now a documented `uninstall` + `install` operation. Mutable knobs only on the cobra command: * `--cluster-name` * `--karpenter-version` * `--create-karpenter-resources` (default `none` — preserves any hand-edits to EC2NodeClass / NodePool; pass `=all` to regenerate) * `--inference-method` * `--debug` A drift check between the running Deployment's namespace and the namespace recorded in the CFN stack errors out before reaching `apply.Run`, so `update` surfaces the inconsistency loudly instead of falling into apply's "foreign install" no-op success path. The convergence work itself is delegated to `apply.Run` with `ActionLabel: "Updating"`, so install and update share the CFN / Helm / aws-auth / NodePool logic — no behavior duplication. Unit tests cover `resolveOptions` (auto-detection happy paths, legacy stack without install-mode tag, corrupt tag, missing KarpenterNamespace, fargate without FargateSubnets) and `validate`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two new tests join the existing autoscaling suite: * `TestAutoscalingUpdate` walks through install (fargate) → update (auto-detect) → idempotent re-update → update with `--create-karpenter-resources=all` → unknown-flag rejection → uninstall, asserting Karpenter remains installed and pods schedule throughout. The unknown-flag sub-test feeds `--install-mode=existing-nodes` to verify pflag rejects the immutable flag with "unknown flag" rather than silently re-applying the install. * `TestAutoscalingUpdateRefusesWhenNoInstall` runs `update` against a cluster with no Karpenter and asserts the command fails with "no Karpenter installation found" — the error path pointing the user at `install` first. A small `testUpdate` helper mirrors `testInstall` so subtests stay terse. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
🛑 Gate Violations
ℹ️ Info🎯 Code Coverage (details) Useful? React with 👍 / 👎 This comment will be updated automatically if new data arrives.🔗 Commit SHA: 2a48e4f | Docs | Datadog PR Page | Give us feedback! |
Codecov Report❌ Patch coverage is ❌ Your patch status has failed because the patch coverage (19.49%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #2976 +/- ##
==========================================
- Coverage 41.39% 41.33% -0.07%
==========================================
Files 331 332 +1
Lines 28911 29042 +131
==========================================
+ Hits 11969 12005 +36
- Misses 16086 16181 +95
Partials 856 856
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
…pflag types The earlier split of `install_mode.go` into a bare type in `apply/` and a pflag wrapper in `install/` added boilerplate (a wrapper type, a conversion at every call site) without buying clarity. Move the whole file to `apply/` like `create_karpenter_resources.go` and `inference_method.go` already are: the pflag methods live where the type lives, and the `install` cobra command binds `apply.InstallMode` directly. `update` does not bind `--install-mode`, so the pflag methods just go unused there — same situation as `apply.InstallMode` itself before this commit. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`InstallModeTagKey` and `DetectedInstallMode` are CloudFormation-stack concerns, not pflag.Value plumbing. Putting them in `install_mode.go` made that file diverge from `inference_method.go` and `create_karpenter_resources.go`, which both moved over from `install` with only a package rename. Move them to `run.go` next to `KarpenterStackName` / `DDKarpenterStackName`, where the rest of the CFN logic lives. The three pflag-type files in `apply/` are now structurally symmetric: data + pflag methods, nothing else. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
update subcommand to kubectl-datadog autoscaling cluster
`commonk8s.CreateOrUpdate` does a Get-then-Update, snapshotting resourceVersion in between. That races against any controller that reconciles the same resource — Karpenter mutates a NodePool's .status (and an EC2NodeClass's status conditions) every time a NodeClaim transitions, which bumps resourceVersion under us. The window is harmless during `install` (the resource was just created and Karpenter is not running yet) but it is the normal state during `update --create-karpenter-resources=all`, and it surfaced as a real e2e failure on the new `TestAutoscalingUpdate` sub-test: failed to update NodePool dd-karpenter-e2mz4: ... the object has been modified; please apply your changes to the latest version and try again Wrapping the Get/Update in `retry.RetryOnConflict(retry.DefaultRetry, …)` is the standard k8s pattern: re-fetch, re-apply, retry. The default backoff caps at five attempts spread over ~2s, which is plenty for the small status-only writes Karpenter performs and keeps the failure mode loud if something else is genuinely wrong. The fix also benefits install's NodePool / EC2NodeClass writes and the `dd-cluster-info` ConfigMap snapshot — same call path, same race once a controller starts touching the namespace. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
c3f919d to
067505c
Compare
`browser.Stdout` and `browser.Stderr` accept any `io.Writer`. The shim only mattered if `streams.Out` / `streams.ErrOut` could be nil, but every other call site in this file (`display.PrintBox`, `fmt.Fprintf`) already assumes they are not — `genericclioptions.IOStreams` is constructed with valid writers by the cobra runtime, so the shim was dead defensive code. Drop it and pass the streams through directly, matching the rest of the package. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The merge of main into the feature branch (commit 87c6e46) brought in PR #2976's `update.go`, which referenced `guess.FindKarpenterInstallation` — the symbol our PR moved to `karpenter.FindInstallation`. The resolution updated all other call sites but missed this one, so the build broke on the merge commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
What does this PR do?
Adds a new
kubectl datadog autoscaling cluster updatesubcommand that refreshes a Karpenter installation previously deployed bykubectl datadog autoscaling cluster install. The command:KarpenterNamespace,InstallMode,FargateSubnets) from the dd-karpenter CloudFormation stack so the user does not have to repeat them on every refresh.--karpenter-version,--create-karpenter-resources,--inference-method,--debug). Trying to pass an immutable flag (--install-mode,--karpenter-namespace,--fargate-subnets) produces a clearunknown flagerror from cobra/pflag — to change those, the documented path isuninstall+install.Motivation
Today, "upgrading" a
kubectl-datadogKarpenter install means re-runninginstallwith exactly the same flags (mode, namespace, fargate subnets). If the user forgets, the install command's immutability check raises a technical error, or worse the user accidentally tweaks an immutable parameter. There was also no way to require that an existing kubectl-datadog Karpenter be present —installhappily treats a foreign Karpenter as a no-op success, which is the wrong UX for an explicit upgrade.This is tracked under Jira CASCL-1323.
Additional Notes
The PR is split into three commits to ease review:
cluster/install/{guess,k8s}/are elevated tocluster/{guess,k8s}/(now used by install, update and uninstall);cluster/install/steps.gobecomescluster/apply/run.goexposing a package-levelRun(ctx, streams, configFlags, clientset, RunOptions) errorplus stack-name helpers (KarpenterStackName,DDKarpenterStackName,InstallModeTagKey,DetectedInstallMode); pflag typesCreateKarpenterResourcesandInferenceMethodmove tocluster/apply/;InstallModedata moves tocluster/apply/install_mode.gowhile the pflag wrapper stays incluster/install/install_mode.go(only install binds--install-mode);cluster/uninstall/andcluster/install/migrate to the new helpers;clients.ResolveClusterNameextracts the explicit-or-kubeconfig fallback used by all three commands.cluster/update/cobra command + unit tests coveringresolveOptions(auto-detection, legacy stack without install-mode tag, corrupt tag, missing namespace, fargate without subnets) andvalidate.TestAutoscalingUpdatewalks install (fargate) → update (auto-detect) → idempotent re-update →--create-karpenter-resources=all→ unknown-flag rejection → uninstall, asserting Karpenter remains installed and pods schedule throughout.TestAutoscalingUpdateRefusesWhenNoInstallasserts the "no Karpenter installation found" error path.Minimum Agent Versions
No agent dependency.
Describe your test plan
Unit tests on
cmd/kubectl-datadog/autoscaling/cluster/...:E2E (the existing autoscaling e2e suite on EKS):
Manual smoke (against a dev EKS cluster — prefix with the AWS env vars):
Checklist
enhancementqa/skip-qalabel🤖 Generated with Claude Code