Conversation
|
@patrickdillon: This pull request references CORS-4334 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/wip |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@patrickdillon: This pull request references CORS-4334 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Enables kube-apiserver running on the bootstrap node to access the pod network, specifically to enable access to webhooks running in the cluster. Changes: * Adds a new static Konnectivity server pod running on the bootstrap node * Configures the bootstrap KAS to use its local Konnectivity server for outbound cluster traffic * Add a daemonset deployed into the cluster to run Konnectivity agent on every cluster node * Removes daemonset automatically in bootstrap teardown Co-authored-by: Matthew Booth <mbooth@redhat.com>
Adds error handling to report konnectivity specific failures when running gather bootstrap or analyze.
This updates all platforms to open the konnectivity port. Baremetal and on-prem platform have user-provisioned networks, so that will need be handled up front.
1d992e7 to
836e8d2
Compare
|
/test e2e-vsphere-ovn e2e-nutanix-ovn |
|
/test e2e-metal-ipi-ovn |
|
We probably want to clean up the konnectivity ports on bootstrap destroy as well. |
|
I have experimented with adding a feature gate to control this and it is possible. |
|
Need to not deploy this on a true single node cluster. |
|
Have read through the changes and the scripts all seem reasonable to me. I'll open a PR to CAPIO that switches us back to Fail webhook policy to test this with |
|
/retest-required |
2 similar comments
|
/retest-required |
|
/retest-required |
|
/cc @sadasu @jhixson74 |
|
/retest |
|
@patrickdillon: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
tthvo
left a comment
There was a problem hiding this comment.
This is pretty cool 😎🔥! I just have a questions and comments while learning/reading about this :D
| ToPort: 8091, | ||
| SourceSecurityGroupRoles: []capa.SecurityGroupRole{"controlplane", "node"}, | ||
| }, | ||
| { |
There was a problem hiding this comment.
We need to add remove this rule when destroying bootstrap, right? This probably means patching the awscluster CR and waiting for the rule to disappear...
💡 Another idea: since this is scoped to only bootstrap node, the installer may pre-create a security group specifically for bootstrap with this rule? This SG can be attached via AdditionalSecurityGroups.
There was a problem hiding this comment.
Whoops, I just saw #10344 (comment) so we do need to clean up the rule :D
| egressSelections: | ||
| - name: "cluster" | ||
| connection: | ||
| proxyProtocol: "HTTPConnect" |
There was a problem hiding this comment.
Just curious: Any reasons to choose HTTPConnect over gRPC, which is should be (theoretically) faster? From docs, it said 👇
# This controls the protocol between the API Server and the Konnectivity
# server. Supported values are "GRPC" and "HTTPConnect". There is no
# end user visible difference between the two modes. You need to set the
# Konnectivity server to work in the same mode.
proxyProtocol: GRPCMy guess is that it's fine to use gRPC? If so, we need to adjust --mode=grpc in konnectivity-server-pod.yaml
| oc delete namespace openshift-bootstrap-konnectivity \ | ||
| --kubeconfig=/opt/openshift/auth/kubeconfig \ | ||
| --ignore-not-found=true || true |
There was a problem hiding this comment.
| oc delete namespace openshift-bootstrap-konnectivity \ | |
| --kubeconfig=/opt/openshift/auth/kubeconfig \ | |
| --ignore-not-found=true || true | |
| oc delete namespace openshift-bootstrap-konnectivity \ | |
| --kubeconfig=/opt/openshift/auth/kubeconfig \ | |
| --ignore-not-found=true |
I guess we should fail if the cleanup somehow failed (except not-found) right? Otherwise, resources will be left behind and can potentially "break" the end openshift cluster?
| {{- if .UseIPv6ForNodeIP }} | ||
| BOOTSTRAP_NODE_IP=$(ip -6 -j route get 2001:4860:4860::8888 | jq -r '.[0].prefsrc') | ||
| {{- else }} | ||
| BOOTSTRAP_NODE_IP=$(ip -j route get 1.1.1.1 | jq -r '.[0].prefsrc') | ||
| {{- end }} |
There was a problem hiding this comment.
We should also honour the field .BootstrapNodeIP if set via the environment variable OPENSHIFT_INSTALL_BOOTSTRAP_NODE_IP, right?
Tracing back to the commit, it may be necessary for assisted installer 🤔?
installer/pkg/asset/ignition/bootstrap/common.go
Lines 347 to 351 in 30c4271
There was a problem hiding this comment.
May we can do something like 👇 WDYT?
{{- if .BootstrapNodeIP }}
# Use explicitly configured bootstrap node IP
BOOTSTRAP_NODE_IP="{{.BootstrapNodeIP}}"
echo "Using configured bootstrap node IP: ${BOOTSTRAP_NODE_IP}"
{{- else }}
# Detect bootstrap node IP at runtime using the default route source address.
# Konnectivity agents use this to connect back to the bootstrap server.
{{- if .UseIPv6ForNodeIP }}
BOOTSTRAP_NODE_IP=$(ip -6 -j route get 2001:4860:4860::8888 | jq -r '.[0].prefsrc')
{{- else }}
BOOTSTRAP_NODE_IP=$(ip -j route get 1.1.1.1 | jq -r '.[0].prefsrc')
{{- end }}
echo "Detected bootstrap node IP: ${BOOTSTRAP_NODE_IP}"
{{- end }}| containers: | ||
| - name: konnectivity-agent | ||
| image: ${KONNECTIVITY_IMAGE} | ||
| command: | ||
| - /usr/bin/proxy-agent |
There was a problem hiding this comment.
nit: we should give this agent container a resource request so that it won't be the first get evicted if node is under pressure (theoretically).
As reference, Hypershift sets the following values 👀
| - name: konnectivity-server | ||
| image: ${KONNECTIVITY_IMAGE} | ||
| command: | ||
| - /usr/bin/proxy-server |
There was a problem hiding this comment.
nit: we should give this server container a resource request so that it won't be the first get evicted if node is under pressure (theoretically).
As reference, Hypershift sets the following values 👀
| updateStrategy: | ||
| type: RollingUpdate | ||
| rollingUpdate: | ||
| maxUnavailable: 10% |
There was a problem hiding this comment.
nit: these pods only run during bootstrap and will "never?" get updated so we can just ignore this setting, right 🤔?
Besides, I guess 10% of 3 control plane node is ~ 1 node; thus, it is equivalent to maxUnavailable: 1, which is already the default that k8s set (according to docs).
Continuation of #10280:
Will break the API vendoring into a separate PR to get that merged sooner rather than later.
Not tested. Opening this now as a /WIP to continue discussion of #10280 with #9628
/cc @JoelSpeed @mdbooth