Skip to content

DNM/SPLAT-2587: aws/ccm configuration#7961

Draft
mtulio wants to merge 11 commits intoopenshift:mainfrom
mtulio:feat-ccm-nlb-sg-exp
Draft

DNM/SPLAT-2587: aws/ccm configuration#7961
mtulio wants to merge 11 commits intoopenshift:mainfrom
mtulio:feat-ccm-nlb-sg-exp

Conversation

@mtulio
Copy link
Contributor

@mtulio mtulio commented Mar 13, 2026

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes

Special notes for your reviewer:

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

@openshift-ci-robot
Copy link

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 13, 2026
@openshift-ci-robot
Copy link

openshift-ci-robot commented Mar 13, 2026

@mtulio: This pull request references SPLAT-2587 which is a valid jira issue.

Details

In response to this:

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes

Special notes for your reviewer:

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 13, 2026

Important

Review skipped

Auto reviews are limited based on label configuration.

🚫 Excluded labels (none allowed) (1)
  • do-not-merge/work-in-progress

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 3fac8124-f076-443f-be8e-4414ba478830

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/needs-area labels Mar 13, 2026
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 13, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@mtulio
Copy link
Contributor Author

mtulio commented Mar 13, 2026

/test e2e-aws-techpreview
/test e2e-aws

@openshift-ci openshift-ci bot added area/cli Indicates the PR includes changes for CLI area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/platform/aws PR/issue for AWS (AWSPlatform) platform area/testing Indicates the PR includes changes for e2e testing and removed do-not-merge/needs-area labels Mar 13, 2026
@mtulio mtulio force-pushed the feat-ccm-nlb-sg-exp branch from c57d2c2 to 48f51a0 Compare March 13, 2026 20:24
@mtulio
Copy link
Contributor Author

mtulio commented Mar 13, 2026

/test e2e-aws-techpreview
/test e2e-aws

@cwbotbot
Copy link

Test Results

e2e-aws

@mtulio mtulio force-pushed the feat-ccm-nlb-sg-exp branch from 48f51a0 to bff6cb6 Compare March 17, 2026 19:59
@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 20, 2026
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 25, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mtulio
Once this PR has been reviewed and has the lgtm label, please assign csrwng for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

mtulio added 10 commits March 25, 2026 19:31
This commit updates IAM permissions to support AWS CCM managing security groups
for Network Load Balancers, following least-privilege security principles.

Changes:
- cmd/infra/aws/iam.go:
* Added 19 specific ELBv2 permissions required for NLB lifecycle management:
  - CreateLoadBalancer, DeleteLoadBalancer, DescribeLoadBalancers
  - ModifyLoadBalancerAttributes
  - CreateTargetGroup, DeleteTargetGroup, DescribeTargetGroups (critical - was missing)
  - DescribeTargetGroupAttributes, ModifyTargetGroupAttributes
  - DescribeTargetHealth, RegisterTargets, DeregisterTargets
  - CreateListener, DeleteListener, DescribeListeners
  - SetSecurityGroups (required for managed SG feature)
  - DescribeTags, AddTags, RemoveTags
* Added 10 specific EC2 permissions for security group management:
  - CreateSecurityGroup, DeleteSecurityGroup, DescribeSecurityGroups
  - AuthorizeSecurityGroupIngress, RevokeSecurityGroupIngress
  - DescribeSubnets, DescribeVpcs, DescribeInstances
  - CreateTags
* Added comprehensive documentation explaining each permission category and
  why it's needed for the CCM's NLB management functionality.

Root cause: CI job failures showed "AccessDenied" for elasticloadbalancing:CreateTargetGroup.
Analysis of CCM controller logs revealed the exact permissions needed for the complete
NLB provisioning workflow with managed security groups.

This change ensures the CCM has sufficient permissions to manage NLBs with security
groups while maintaining security best practices by granting only necessary permissions.
This commit implements feature gate support for AWS CCM NLBSecurityGroupMode
configuration, enabling managed security groups for Network Load Balancers.

This enables the AWS Cloud Controller Manager to automatically create and manage
security groups for NLBs when the feature gate is enabled, reducing manual
infrastructure management overhead.

Changes:
* Modified adaptConfig() to dynamically set the CCM configuration
  "NLBSecurityGroupMode" to "Managed" value to when the feature is enabled.
This commit adds comprehensive E2E testing for the AWSServiceLBNetworkSecurityGroup
feature gate and fixes critical test issues.

Changes:
* Added featuregates.ConfigureFeatureSet(string(featureSet)) call to synchronize
  the global feature gate state with the cluster's feature set. Without this,
  tests checking featuregates.Gate().Enabled() were incorrectly using the default
  feature set instead of TechPreviewNoUpgrade.
* Added test case: "When NLBSecurityGroupMode is enabled it must have config
  NLBSecurityGroupMode=Managed entry in cloud-config configmap" to validate
  the CCM configuration contains the required setting when the feature gate
  is enabled.
* Added test case: "When AWSServiceLBNetworkSecurityGroup is enabled it must
  create a LoadBalancer NLB with managed security group attached" to validate
  end-to-end NLB creation with managed security groups.
* Fixed TestOnCreateAPIUX GCP validation by adding missing CloudController
  field to ServiceAccountsEmails struct (was causing "Required value" error).
…e evaluation

Updates AWS CCM configuration tests to explicitly verify that the config
adapter reads feature gates from HCP.Spec.Configuration rather than relying
on the global operator feature set.

Changes:
- Set global feature gate to Default at test start to prove the adapter
  correctly reads from HCP configuration, not the global gate
- Added explanatory comments documenting this proves the fix for the issue
  where global gate was incorrectly used instead of per-cluster configuration

This change is critical for validating the fix to PR openshift#7460 e2e test failures
where:
- E2E tests create clusters with TechPreviewNoUpgrade feature set
- HypershiftOperator runs with Default feature set (HYPERSHIFT_FEATURESET env)
- Tests were failing because adapter checked global gate (Default) instead of
  cluster's configuration (TechPreviewNoUpgrade)
- Result: NLBSecurityGroupMode was missing from aws-cloud-config even though
  cluster's feature set should enable it

With this test change, we verify all scenarios work correctly:
- Default feature set: NLBSecurityGroupMode NOT added
- TechPreviewNoUpgrade: NLBSecurityGroupMode = Managed added
- CustomNoUpgrade with explicit enable: NLBSecurityGroupMode = Managed added

All tests pass, proving the adapter now correctly evaluates per-cluster
feature configuration regardless of the global operator feature set.

Signed-off-by: Marco Braga <mrbraga@redhat.com>
Assisted-by: Claude Sonnet 4.5 (via Claude Code)
Fixes AWS CCM config adapter to read feature gates from per-cluster
HostedControlPlane configuration instead of the global operator feature set.

Problem:
The AWS CCM configuration adapter was incorrectly using the global feature
gate (configured from HYPERSHIFT_FEATURESET environment variable) instead of
reading the per-cluster configuration from HCP.Spec.Configuration.FeatureGate.

This caused e2e test failures in PR openshift#7460 where:
1. Tests create clusters with TechPreviewNoUpgrade feature set (version >= 4.18)
2. hypershift-operator runs with HYPERSHIFT_FEATURESET=Default (or unset)
3. control-plane-operator inherits Default feature set via env var
4. AWS CCM adapter checked global gate (Default) → AWSServiceLBNetworkSecurityGroup NOT enabled
5. Result: NLBSecurityGroupMode missing from aws-cloud-config even though cluster's feature set enables it

Evidence: Test artifacts showed feature-gate.yaml with TechPreviewNoUpgrade (correct),
but aws-cloud-config.yaml missing NLBSecurityGroupMode (incorrect).

Solution:
1. Added IsFeatureEnabledInFeatureGateSpec() helper in featuregates package to
   evaluate feature gates based on FeatureGateSpec rather than global gate
2. Modified AWS CCM config adapter to use this helper with HCP.Spec.Configuration.FeatureGate

Why a new helper function instead of reusing existing ones:
- Existing FeatureGatesForFeatureSet() only handles fixed feature sets by name
  (Default, TechPreviewNoUpgrade, DevPreviewNoUpgrade)
- We need to evaluate complete FeatureGateSpec including CustomNoUpgrade with
  explicit enabled/disabled feature lists
- This matches the pattern used by CVO, KAS, and other components that correctly
  read from HCP.Spec.Configuration

Changes:
- control-plane-operator/featuregates/featuregates.go: New helper function
- control-plane-operator/.../aws/config.go: Use helper to read per-cluster config

Testing: Unit tests pass for all scenarios (Default, TechPreviewNoUpgrade, CustomNoUpgrade)

Signed-off-by: Marco Braga <mrbraga@redhat.com>
Assisted-by: Claude Sonnet 4.5 (via Claude Code)
@mtulio mtulio force-pushed the feat-ccm-nlb-sg-exp branch from 80ae47e to 4079af6 Compare March 25, 2026 22:32
@openshift-ci openshift-ci bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 25, 2026
@mtulio
Copy link
Contributor Author

mtulio commented Mar 25, 2026

/test e2e-aws-techpreview

@mtulio
Copy link
Contributor Author

mtulio commented Mar 25, 2026

/test e2e-aws

@mtulio
Copy link
Contributor Author

mtulio commented Mar 26, 2026

/test ?

@mtulio
Copy link
Contributor Author

mtulio commented Mar 26, 2026

/test e2e-aws
/test e2e-aws-techpreview
/test e2e-conformance

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 26, 2026

@mtulio: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws 8b49dcd link true /test e2e-aws
ci/prow/e2e-aws-techpreview 8b49dcd link false /test e2e-aws-techpreview
ci/prow/e2e-conformance 8b49dcd link false /test e2e-conformance

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/cli Indicates the PR includes changes for CLI area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/platform/aws PR/issue for AWS (AWSPlatform) platform area/testing Indicates the PR includes changes for e2e testing do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants