Conversation
SplitN(s, ".", 1) returns the original string unsplit since n=1 returns a single-element slice containing the whole string. Changed to n=2 to correctly extract the short hostname (everything before the first "."). This fixes hypervisor hostname comparisons that were incorrectly comparing full FQDNs against short hostnames, which could block onboarding or cause false "could not find exact match" errors.
The validation checks for certificate-issuer-name and certificate-namespace were running before flag.Parse(), meaning user-provided values were not yet parsed. This made the validation ineffective - it would always pass since the variables had non-empty defaults. Moving the validation after flag.Parse() ensures that if a user passes an empty value (e.g., --certificate-issuer-name=""), it will be caught.
The HA service HTTP call was using the default HTTP client with no timeout, which could hang indefinitely and block the reconciliation goroutine. Added a 30-second timeout to prevent this.
Unlike the Hypervisor status patches which use PatchHypervisorStatusWithRetry with exponential backoff retries, the Eviction status patches were using a direct optimistic-lock patch with no retry. Under contention this would fail with a conflict and require a full re-reconciliation. Added retry logic using the same StatusPatchBackoff configuration used by the Hypervisor controller. Also added PatchEvictionStatusWithRetry helper for future use.
The transferLabels slice is a package-level variable that was being mutated during SetupWithManager by appending custom label selector keys. While this is called only once during initialization, it could cause data races if tests run in parallel with -race flag. Now we copy the slice before appending to avoid mutating the original.
PeriodSeconds of 0 is technically invalid in the Kubernetes API - the minimum is 1 second. Kubernetes would silently default it to 10 seconds, which may not be the intended behavior. Changed to explicit 1 second.
If KVM_HA_SERVICE_URL doesn't end with a "/", the URL would be malformed (e.g., "https://exampleapi/hypervisors/host" instead of "https://example/api/hypervisors/host"). Now we ensure a trailing slash exists before appending the path.
The error message said "could not list networks" but it was about finding the test image. Changed to "could not find test image".
Changed "could create certificate" to "could not create certificate". Also added missing colon before the error wrapping placeholder.
Both internal/controller/utils.go and internal/openstack/aggregates.go had their own implementations of a set-difference function. The controller version used generics while the openstack version was string-specific. Moved the generic version to internal/utils/slices.go so it can be shared by both packages without circular imports.
Several SetupWithManager functions were fetching a logger from context and then immediately discarding it with `_`. Removed these unnecessary lines.
The updateStatus method used a stale base object across retry attempts, meaning retries would always fail with the same conflict. Now re-fetches the latest eviction on each retry attempt, matching the pattern used by PatchHypervisorStatusWithRetry.
…r typo - Add explicit return after the 'deleting' branch so control flow does not fall through to the final return statement. Previously it worked by coincidence (err was always nil), but was fragile for future changes. - Fix typo: 'could update status' -> 'could not update status' - Replace dangling err reference at final return with explicit nil - Simplify if/else blocks by removing unnecessary else after return
…ller Previously, when the Hypervisor was not found, the controller continued with a zero-value struct and relied on LifecycleEnabled defaulting to false to exit early. Now explicitly returns when the Hypervisor is not found, making the intent clear and avoiding fragile zero-value reliance.
The --version flag previously printed nothing and was checked after required flag validation, so running with --version and missing flags would error instead of printing the version.
Remove the following unused symbols: - ErrRetry variable (utils.go) - HasStatusCondition function (utils.go) - IsNodeConditionTrue function (utils.go) - labelEvictionRequired constant (constants.go) - labelEvictionApproved constant (constants.go) - PatchEvictionStatusWithRetry function (status_patch.go)
📝 WalkthroughWalkthroughThe PR refactors RBAC permissions to restrict Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~35 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 60 minutes.Comment |
The Clone() method was not copying the cfg field, violating the Clone contract by returning an encoder with a zero-valued EncoderConfig.
Previously returned a pointer to the range loop variable (a copy), meaning mutations through the pointer would not affect the original slice. Now indexes directly into the slice to return a stable pointer.
Status subresources don't support create, delete, list, or watch verbs. Remove over-broad permissions from hypervisor/status RBAC annotations across all controllers.
…Found After the updateStatus refactor to re-fetch resources on retry, the base parameter is no longer used. Remove it along with the now-unnecessary DeepCopy calls at each call site. Also remove unused gosec nolint directive in utils.go.
a37a3d5 to
2fc05c6
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (3)
internal/controller/node_certificate_controller.go (2)
147-150: Minor: use structured logging instead of fmt.Sprintf (optional).In
ensureCertificate,log.Info(fmt.Sprintf("Certificate %s %s", certName, update))could be replaced with structured fields (e.g.,"certName","operation"). This makes log filtering/aggregation easier and is idiomatic for controller-runtime loggers.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/node_certificate_controller.go` around lines 147 - 150, Replace the fmt.Sprintf usage in ensureCertificate with structured logging: instead of log.Info(fmt.Sprintf("Certificate %s %s", certName, update)), call log.Info with message and key/value fields (e.g., log.Info("Certificate operation", "certName", certName, "operation", update)) so the controller-runtime logger emits structured fields; ensure the conditional still checks controllerutil.OperationResultNone and retain the same message semantics.
169-171: Minor: consider including more reconciliation context in the wrapped error (optional).Right now the retry wrapper preserves the underlying
err, but the final reconciler error doesn’t include identifiers likenode.Name/req.Name(even thoughensureCertificatehas them locally). If this error ever shows up in cluster logs/events, adding those fields would speed up debugging.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/node_certificate_controller.go` around lines 169 - 171, The retry wrapper call to retry.RetryOnConflict currently returns the underlying error without identifying which node failed; capture the returned err and wrap it with context including the node identifier (e.g., node.Name or req.Name) before returning from the reconciler. Concretely, after calling retry.RetryOnConflict(func() error { return r.ensureCertificate(ctx, node, node.Name) }), check the error and return a wrapped error such as using fmt.Errorf("reconcile certificate for node %s: %w", node.Name, err) (or errors.Wrapf) so the final reconciler error includes the node name and aids debugging.internal/logger/logger.go (1)
25-27: Constructor rename is correct; consider named fields for resilience.Good fix. Optional: at Line 26, use named fields in the struct literal to avoid field-order coupling.
Optional tidy-up
func NewSanitizeReconcileErrorEncoder(cfg zapcore.EncoderConfig) zapcore.Encoder { - return &SanitizeReconcileErrorEncoder{zapcore.NewConsoleEncoder(cfg), cfg} + return &SanitizeReconcileErrorEncoder{ + Encoder: zapcore.NewConsoleEncoder(cfg), + cfg: cfg, + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/logger/logger.go` around lines 25 - 27, The constructor NewSanitizeReconcileErrorEncoder currently returns a SanitizeReconcileErrorEncoder using a positional struct literal; change it to use named fields to prevent field-order coupling by constructing &SanitizeReconcileErrorEncoder{Encoder: zapcore.NewConsoleEncoder(cfg), Config: cfg} (match the actual field names on SanitizeReconcileErrorEncoder) so future field reorders won't break this initializer.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/controller/eviction_controller.go`:
- Around line 172-185: The current updateStatus implementation captures a stale
full-status snapshot in desiredStatus and then overwrites freshEviction.Status
with that snapshot inside the retry, which can revert concurrent status updates;
modify EvictionReconciler.updateStatus so the mutation of freshEviction.Status
happens inside the RetryOnConflict closure and is derived from the freshly
fetched freshEviction (e.g., copy only the intended fields from the previously
computed desired state rather than assigning the whole desiredStatus), then
compute freshBase := freshEviction.DeepCopy() and call r.Status().Patch(ctx,
freshEviction, client.MergeFromWithOptions(freshBase,
client.MergeFromWithOptimisticLock{}),
client.FieldOwner(EvictionControllerName)) so each retry generates a patch
against the latest resourceVersion instead of reapplying a stale snapshot.
In `@internal/controller/hypervisor_controller.go`:
- Around line 198-202: The package-level slice transferLabels is being
reassigned and can leak state across repeated SetupWithManager runs; in the
SetupWithManager path rebuild a local labels slice from an immutable base (do
not reassign the package-level transferLabels), append requirement.Key() values
from requirements into that local slice, and deduplicate entries (e.g., via a
temporary map) before using it — update any code that currently does
transferLabels = append([]string{}, transferLabels...) to instead create a fresh
localLabels := append([]string{}, baseTransferLabels...) then append
requirement.Key() for each requirement and remove duplicates prior to using
localLabels.
---
Nitpick comments:
In `@internal/controller/node_certificate_controller.go`:
- Around line 147-150: Replace the fmt.Sprintf usage in ensureCertificate with
structured logging: instead of log.Info(fmt.Sprintf("Certificate %s %s",
certName, update)), call log.Info with message and key/value fields (e.g.,
log.Info("Certificate operation", "certName", certName, "operation", update)) so
the controller-runtime logger emits structured fields; ensure the conditional
still checks controllerutil.OperationResultNone and retain the same message
semantics.
- Around line 169-171: The retry wrapper call to retry.RetryOnConflict currently
returns the underlying error without identifying which node failed; capture the
returned err and wrap it with context including the node identifier (e.g.,
node.Name or req.Name) before returning from the reconciler. Concretely, after
calling retry.RetryOnConflict(func() error { return r.ensureCertificate(ctx,
node, node.Name) }), check the error and return a wrapped error such as using
fmt.Errorf("reconcile certificate for node %s: %w", node.Name, err) (or
errors.Wrapf) so the final reconciler error includes the node name and aids
debugging.
In `@internal/logger/logger.go`:
- Around line 25-27: The constructor NewSanitizeReconcileErrorEncoder currently
returns a SanitizeReconcileErrorEncoder using a positional struct literal;
change it to use named fields to prevent field-order coupling by constructing
&SanitizeReconcileErrorEncoder{Encoder: zapcore.NewConsoleEncoder(cfg), Config:
cfg} (match the actual field names on SanitizeReconcileErrorEncoder) so future
field reorders won't break this initializer.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a10f3535-cbaa-474a-8f3c-fe041149a1f7
📒 Files selected for processing (17)
cmd/main.gointernal/controller/aggregates_controller.gointernal/controller/constants.gointernal/controller/eviction_controller.gointernal/controller/gardener_node_lifecycle_controller.gointernal/controller/hypervisor_controller.gointernal/controller/hypervisor_instance_ha_controller.gointernal/controller/hypervisor_maintenance_controller.gointernal/controller/hypervisor_taint_controller.gointernal/controller/node_certificate_controller.gointernal/controller/onboarding_controller.gointernal/controller/traits_controller.gointernal/controller/utils.gointernal/controller/utils_test.gointernal/logger/logger.gointernal/openstack/aggregates.gointernal/utils/slices.go
💤 Files with no reviewable changes (1)
- internal/controller/hypervisor_instance_ha_controller.go
Prevents accumulation of stale label selector keys when SetupWithManager is called multiple times (e.g., in tests with multiple managers).
Apply desired status fields individually onto the freshly-fetched object in the retry closure instead of overwriting the entire status, preventing concurrent condition changes from being reverted. Add deduplication when appending label selector keys to transferLabels.
Merging this branch changes the coverage (1 decrease, 2 increase)
Coverage by fileChanged files (no unit tests)
Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code. Changed unit test files
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
config/rbac/role.yaml (1)
73-82: 💤 Low valueConsider removing
listandwatchfromhypervisors/statusfor tighter least-privilege.The
/statussubresource endpoint is typically only used forget,update, andpatchoperations. Controllers watch/list the main resource (not the status subresource) and receive the status as part of the object. The context snippet fromaggregates_controller.goconfirms onlyget;update;patchis declared.If this file is generated via
controller-gen, verify that no controller annotation actually requestslistorwatchonhypervisors/status. If none do, these verbs may be safely removed.🔧 Suggested narrowing
- apiGroups: - kvm.cloud.sap resources: - hypervisors/status verbs: - get - - list - patch - update - - watch🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@config/rbac/role.yaml` around lines 73 - 82, The RBAC rule grants unnecessary verbs on the status subresource: remove "list" and "watch" from the resources entry for "hypervisors/status" in role.yaml to follow least-privilege (keep only get, patch, update); confirm no controller annotations request list/watch for the status subresource (check aggregates_controller.go and any +kubebuilder:rbac annotations) and regenerate the role if the manifest is produced by controller-gen.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@config/rbac/role.yaml`:
- Around line 73-82: The RBAC rule grants unnecessary verbs on the status
subresource: remove "list" and "watch" from the resources entry for
"hypervisors/status" in role.yaml to follow least-privilege (keep only get,
patch, update); confirm no controller annotations request list/watch for the
status subresource (check aggregates_controller.go and any +kubebuilder:rbac
annotations) and regenerate the role if the manifest is produced by
controller-gen.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ede49ea4-38e0-42ba-84a5-329f9abade20
📒 Files selected for processing (3)
config/rbac/role.yamlinternal/controller/eviction_controller.gointernal/controller/hypervisor_controller.go
🚧 Files skipped from review as they are similar to previous changes (1)
- internal/controller/hypervisor_controller.go
fwiesel
left a comment
There was a problem hiding this comment.
Can you move the changes to the eviction controller to its own PR?
Most of the changes are clear cleanup improvements, but from my perspective the changes to the eviction controller have some side-effects that maybe could be discussed separately.
There was a problem hiding this comment.
Okay, I read up on it a bit.
Since
- only the eviction-controller is modifying the eviction status,
- and the controller-runtime ensures that even with multiple threads, only one reconciliation is running per object,
we are not at risk of having the same controller modifying the same object in an "ABBA" manner.
So, it should work, but I'd prefer then using Patch without Lock with a comment pointing that out why (i.e only one controller, per object locking in controller-runtime).
It is not a must from my side to change it that way. I am looking at SSA, and that might solve that for us quite elegantly.
Multiple reruns of agentic code issue inspections, paired with individual fixes.
Includes 2-3 serious fixes, but mostly minor stuff - still worthy.
Commits have been reviewed and are pretty small individually.
Summary by CodeRabbit
New Features
--versionflag to display application version and build information.Bug Fixes
Security
Chores