CORS-4334: Konnectivity by patrickdillon · Pull Request #10344 · openshift/installer

patrickdillon · 2026-03-02T04:08:30Z

Continuation of #10280:

Refactored to reduce in-lining in bootkube.sh
Added some gating (needs port opening on some or all platforms)

Will break the API vendoring into a separate PR to get that merged sooner rather than later.

Not tested. Opening this now as a /WIP to continue discussion of #10280 with #9628
/cc @JoelSpeed @mdbooth

openshift-ci-robot · 2026-03-02T04:08:34Z

@patrickdillon: This pull request references CORS-4334 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Continuation of #10280:

Refactored to reduce in-lining in bootkube.sh

Added some gating (needs port opening on some or all platforms)

Will break the API changes into a separate PR.

Not tested. Opening this now as a /WIP to continue discussion of #10280 with #9628
/cc @JoelSpeed @mdbooth

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

patrickdillon · 2026-03-02T04:08:39Z

/wip
/hold

openshift-ci · 2026-03-02T04:09:14Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign tthvo for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot · 2026-03-02T04:10:26Z

@patrickdillon: This pull request references CORS-4334 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Continuation of #10280:

Refactored to reduce in-lining in bootkube.sh

Added some gating (needs port opening on some or all platforms)

Will break the API vendoring into a separate PR to get that merged sooner rather than later.

Not tested. Opening this now as a /WIP to continue discussion of #10280 with #9628
/cc @JoelSpeed @mdbooth

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Enables kube-apiserver running on the bootstrap node to access the pod network, specifically to enable access to webhooks running in the cluster. Changes: * Adds a new static Konnectivity server pod running on the bootstrap node * Configures the bootstrap KAS to use its local Konnectivity server for outbound cluster traffic * Add a daemonset deployed into the cluster to run Konnectivity agent on every cluster node * Removes daemonset automatically in bootstrap teardown Co-authored-by: Matthew Booth <mbooth@redhat.com>

Adds error handling to report konnectivity specific failures when running gather bootstrap or analyze.

This updates all platforms to open the konnectivity port. Baremetal and on-prem platform have user-provisioned networks, so that will need be handled up front.

patrickdillon · 2026-03-02T22:16:12Z

/test e2e-vsphere-ovn e2e-nutanix-ovn
/test ?

patrickdillon · 2026-03-02T22:18:36Z

/test e2e-metal-ipi-ovn
/test e2e-agent-compact-ipv4

patrickdillon · 2026-03-02T22:20:26Z

We probably want to clean up the konnectivity ports on bootstrap destroy as well.

patrickdillon · 2026-03-02T22:39:48Z

I have experimented with adding a feature gate to control this and it is possible.

patrickdillon · 2026-03-02T22:44:41Z

Need to not deploy this on a true single node cluster.

JoelSpeed · 2026-03-03T08:42:58Z

Have read through the changes and the scripts all seem reasonable to me. I'll open a PR to CAPIO that switches us back to Fail webhook policy to test this with

patrickdillon · 2026-03-03T18:28:05Z

/retest-required

patrickdillon · 2026-03-03T19:18:09Z

/retest-required

patrickdillon · 2026-03-03T21:33:56Z

/retest-required

tthvo · 2026-03-12T20:47:41Z

/cc @sadasu @jhixson74

tthvo · 2026-03-12T23:06:55Z

/retest

openshift-ci · 2026-03-13T04:07:08Z

@patrickdillon: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-gcp-custom-endpoints	`836e8d2`	link	false	`/test e2e-gcp-custom-endpoints`
ci/prow/e2e-aws-ovn-shared-vpc-edge-zones	`836e8d2`	link	false	`/test e2e-aws-ovn-shared-vpc-edge-zones`
ci/prow/e2e-aws-ovn-heterogeneous	`836e8d2`	link	false	`/test e2e-aws-ovn-heterogeneous`
ci/prow/e2e-aws-byo-subnet-role-security-groups	`836e8d2`	link	false	`/test e2e-aws-byo-subnet-role-security-groups`
ci/prow/gcp-custom-endpoints-proxy-wif	`836e8d2`	link	false	`/test gcp-custom-endpoints-proxy-wif`
ci/prow/e2e-openstack-ovn	`836e8d2`	link	true	`/test e2e-openstack-ovn`
ci/prow/e2e-gcp-custom-dns	`836e8d2`	link	false	`/test e2e-gcp-custom-dns`
ci/prow/e2e-openstack-proxy	`836e8d2`	link	false	`/test e2e-openstack-proxy`
ci/prow/e2e-azurestack	`836e8d2`	link	false	`/test e2e-azurestack`
ci/prow/e2e-gcp-xpn-custom-dns	`836e8d2`	link	false	`/test e2e-gcp-xpn-custom-dns`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

tthvo

This is pretty cool 😎🔥! I just have a questions and comments while learning/reading about this :D

tthvo · 2026-03-13T23:45:59Z

pkg/asset/manifests/aws/cluster.go

+						ToPort:                   8091,
+						SourceSecurityGroupRoles: []capa.SecurityGroupRole{"controlplane", "node"},
+					},
 					{


We need to add remove this rule when destroying bootstrap, right? This probably means patching the awscluster CR and waiting for the rule to disappear...

💡 Another idea: since this is scoped to only bootstrap node, the installer may pre-create a security group specifically for bootstrap with this rule? This SG can be attached via AdditionalSecurityGroups.

Whoops, I just saw #10344 (comment) so we do need to clean up the rule :D

tthvo · 2026-03-14T00:43:28Z

data/data/bootstrap/files/opt/openshift/egress-selector-config.yaml

+egressSelections:
+- name: "cluster"
+  connection:
+    proxyProtocol: "HTTPConnect"


Just curious: Any reasons to choose HTTPConnect over gRPC, which is should be (theoretically) faster? From docs, it said 👇

# This controls the protocol between the API Server and the Konnectivity # server. Supported values are "GRPC" and "HTTPConnect". There is no # end user visible difference between the two modes. You need to set the # Konnectivity server to work in the same mode. proxyProtocol: GRPC

My guess is that it's fine to use gRPC? If so, we need to adjust --mode=grpc in konnectivity-server-pod.yaml

tthvo · 2026-03-14T02:46:29Z

data/data/bootstrap/files/usr/local/bin/konnectivity.sh.template

+        oc delete namespace openshift-bootstrap-konnectivity \
+            --kubeconfig=/opt/openshift/auth/kubeconfig \
+            --ignore-not-found=true || true


Suggested change

oc delete namespace openshift-bootstrap-konnectivity \

--kubeconfig=/opt/openshift/auth/kubeconfig \

--ignore-not-found=true || true

oc delete namespace openshift-bootstrap-konnectivity \

--kubeconfig=/opt/openshift/auth/kubeconfig \

--ignore-not-found=true

I guess we should fail if the cleanup somehow failed (except not-found) right? Otherwise, resources will be left behind and can potentially "break" the end openshift cluster?

tthvo · 2026-03-14T03:54:28Z

data/data/bootstrap/files/usr/local/bin/konnectivity.sh.template

+{{- if .UseIPv6ForNodeIP }}
+    BOOTSTRAP_NODE_IP=$(ip -6 -j route get 2001:4860:4860::8888 | jq -r '.[0].prefsrc')
+{{- else }}
+    BOOTSTRAP_NODE_IP=$(ip -j route get 1.1.1.1 | jq -r '.[0].prefsrc')
+{{- end }}


We should also honour the field .BootstrapNodeIP if set via the environment variable OPENSHIFT_INSTALL_BOOTSTRAP_NODE_IP, right?

Tracing back to the commit, it may be necessary for assisted installer 🤔?

installer/pkg/asset/ignition/bootstrap/common.go

Lines 347 to 351 in 30c4271

bootstrapNodeIP := os.Getenv("OPENSHIFT_INSTALL_BOOTSTRAP_NODE_IP")

if bootstrapNodeIP != "" && net.ParseIP(bootstrapNodeIP) == nil {

logrus.Warnf("OPENSHIFT_INSTALL_BOOTSTRAP_NODE_IP must have valid ip address, given %s. Skipping it", bootstrapNodeIP)

bootstrapNodeIP = ""

}

installer/pkg/asset/ignition/bootstrap/common.go

Line 95 in 30c4271

BootstrapNodeIP string

May we can do something like 👇 WDYT?

{{- if .BootstrapNodeIP }} # Use explicitly configured bootstrap node IP BOOTSTRAP_NODE_IP="{{.BootstrapNodeIP}}" echo "Using configured bootstrap node IP: ${BOOTSTRAP_NODE_IP}" {{- else }} # Detect bootstrap node IP at runtime using the default route source address. # Konnectivity agents use this to connect back to the bootstrap server. {{- if .UseIPv6ForNodeIP }} BOOTSTRAP_NODE_IP=$(ip -6 -j route get 2001:4860:4860::8888 | jq -r '.[0].prefsrc') {{- else }} BOOTSTRAP_NODE_IP=$(ip -j route get 1.1.1.1 | jq -r '.[0].prefsrc') {{- end }} echo "Detected bootstrap node IP: ${BOOTSTRAP_NODE_IP}" {{- end }}

tthvo · 2026-03-14T05:21:45Z

data/data/bootstrap/files/opt/openshift/konnectivity-agent-daemonset.yaml

+      containers:
+      - name: konnectivity-agent
+        image: ${KONNECTIVITY_IMAGE}
+        command:
+        - /usr/bin/proxy-agent


nit: we should give this agent container a resource request so that it won't be the first get evicted if node is under pressure (theoretically).

As reference, Hypershift sets the following values 👀

tthvo · 2026-03-14T05:23:24Z

data/data/bootstrap/files/opt/openshift/konnectivity-server-pod.yaml

+  - name: konnectivity-server
+    image: ${KONNECTIVITY_IMAGE}
+    command:
+    - /usr/bin/proxy-server


nit: we should give this server container a resource request so that it won't be the first get evicted if node is under pressure (theoretically).

As reference, Hypershift sets the following values 👀

tthvo · 2026-03-14T05:27:41Z

data/data/bootstrap/files/opt/openshift/konnectivity-agent-daemonset.yaml

+  updateStrategy:
+    type: RollingUpdate
+    rollingUpdate:
+      maxUnavailable: 10%


nit: these pods only run during bootstrap and will "never?" get updated so we can just ignore this setting, right 🤔?

Besides, I guess 10% of 3 control plane node is ~ 1 node; thus, it is equivalent to maxUnavailable: 1, which is already the default that k8s set (according to docs).

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 2, 2026

openshift-ci bot requested a review from JoelSpeed March 2, 2026 04:08

openshift-ci bot requested a review from mdbooth March 2, 2026 04:08

patrickdillon and others added 3 commits March 2, 2026 17:13

pkg/gather: analyze konnectivity failures

de4ebde

Adds error handling to report konnectivity specific failures when running gather bootstrap or analyze.

Open konnectivity port

836e8d2

This updates all platforms to open the konnectivity port. Baremetal and on-prem platform have user-provisioned networks, so that will need be handled up front.

patrickdillon force-pushed the konnectivity branch from 1d992e7 to 836e8d2 Compare March 2, 2026 22:15

JoelSpeed mentioned this pull request Mar 3, 2026

DNM: NO-JIRA: Set admission failure policy to Fail openshift/cluster-capi-operator#486

Draft

openshift-ci bot requested review from jhixson74 and sadasu March 12, 2026 20:47

tthvo reviewed Mar 14, 2026

View reviewed changes

tthvo mentioned this pull request Mar 14, 2026

Add installer/bootstrap-konnectivity-tunnel.md openshift/enhancements#1941

Open

	bootstrapNodeIP := os.Getenv("OPENSHIFT_INSTALL_BOOTSTRAP_NODE_IP")
	if bootstrapNodeIP != "" && net.ParseIP(bootstrapNodeIP) == nil {
	logrus.Warnf("OPENSHIFT_INSTALL_BOOTSTRAP_NODE_IP must have valid ip address, given %s. Skipping it", bootstrapNodeIP)
	bootstrapNodeIP = ""
	}

Conversation

patrickdillon commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci-robot commented Mar 2, 2026 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

patrickdillon commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci bot commented Mar 2, 2026

Uh oh!

openshift-ci-robot commented Mar 2, 2026 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

patrickdillon commented Mar 2, 2026

Uh oh!

patrickdillon commented Mar 2, 2026

Uh oh!

patrickdillon commented Mar 2, 2026

Uh oh!

patrickdillon commented Mar 2, 2026

Uh oh!

patrickdillon commented Mar 2, 2026

Uh oh!

JoelSpeed commented Mar 3, 2026

Uh oh!

patrickdillon commented Mar 3, 2026

Uh oh!

patrickdillon commented Mar 3, 2026

Uh oh!

patrickdillon commented Mar 3, 2026

Uh oh!

tthvo commented Mar 12, 2026

Uh oh!

tthvo commented Mar 12, 2026

Uh oh!

openshift-ci bot commented Mar 13, 2026

Uh oh!

tthvo left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tthvo Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

patrickdillon commented Mar 2, 2026 •

edited

Loading

openshift-ci-robot commented Mar 2, 2026 •

edited by openshift-ci bot

Loading

patrickdillon commented Mar 2, 2026 •

edited

Loading

openshift-ci-robot commented Mar 2, 2026 •

edited by openshift-ci bot

Loading

tthvo Mar 14, 2026 •

edited

Loading