[BUGFIX] Fix ironic conductor for post-adoption baremetal provisioning#1302
[BUGFIX] Fix ironic conductor for post-adoption baremetal provisioning#1302imatza-rh wants to merge 1 commit intoopenstack-k8s-operators:downstreamfrom
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
32956ab to
e6d940f
Compare
e6d940f to
aed358b
Compare
2e5ab58 to
30ba585
Compare
58d655e to
ee48361
Compare
Fix issues preventing post-adoption baremetal provisioning in the test-with-ironic scenario: 1. Missing conductor DHCP ranges (OSPRH-28474): ironicConductors had no dhcpRanges, so nodes couldn't PXE boot. 2. IPA heartbeat using unreachable K8s hostname (OSPRH-28474): Set endpoint_override to MetalLB VIP in conductor service. 3. Conductor dnsmasq missing UEFI boot directives (OSPRH-28474): ironic-operator conductor template lacks dhcp-boot lines. Workaround patches dnsmasq.conf at runtime, gated by toggle. 4. Conductor dnsmasq DHCP race (OSPRH-28474): deployed instances get wrong IP from conductor pool. Workaround adds dhcp-host entry and reboots instance. 5. Deploy images cleared with no fallback (OSPRH-28476): set httpboot IPA deploy images on nodes after clearing. Additional fixes: - Clean stale nova-compute services blocking FFU version convergence - Wait for source Keystone availability before pre-launch tests - Fix SSH quoting in pre_launch_ironic.bash wait functions - Add IPv6 URL bracket wrapping (RFC 3986) and CONDUCTOR_IP guards - Detect ERROR state in wait_server_active (early exit vs timeout) OSPRH-27919 Assisted-By: Claude Code Signed-off-by: Itay Matza <imatza@redhat.com>
ee48361 to
bab1446
Compare
| - {{ ironic_network }} | ||
| provisionNetwork: {{ ironic_network }} | ||
| # OSPRH-28478: conductor dhcpRanges required for deploy/clean | ||
| dhcpRanges: |
There was a problem hiding this comment.
This DHCP is for the standalone ironic - not for full RHOSO.
Neutron is providing DHCP for ironic provisioning.
juliakreger
left a comment
There was a problem hiding this comment.
Instead of changes, I'd like to request we schedule a meeting with the hardprov team to discuss this. We are concerned features are getting confused because DHCP for baremetal nodes should be provided by neutron, and standalone direct usage of the Ironic deployment is not an expected nor directly supported scenario outside of the integrated use case. i.e. Asking ironic to deploy a baremetal node with VIF would work and be supported, but operating ironic without neutron would not be inherently supported.
Test branch: removes conductor dhcpRanges, dnsmasq UEFI patch tasks, and the toggle variable. Keeps all other PR openstack-k8s-operators#1302 fixes. Purpose: verify whether OVN handles baremetal DHCP on the provisioning network without conductor dnsmasq involvement. NOT FOR MERGE - test only. Assisted-By: Claude Code Signed-off-by: Itay Matza <imatza@redhat.com>
|
Closing - our CI pipeline passes without these workarounds (buildset 23765d65). |
Fix issues preventing post-adoption baremetal provisioning in the
test-with-ironic scenario. Contains workarounds for ironic-operator
bugs and related fixes. Discovered during shiftstack adoption CI
(OSPRH-27919),
but all fixes affect regular ironic adoption too.
Fixes:
Missing conductor DHCP ranges (OSPRH-28474):
ironicConductors had no
dhcpRanges, nodes couldn't PXE boot.Added
172.20.1.240-249(non-overlapping with inspector190-199).IPA heartbeat using unreachable K8s hostname (OSPRH-28474):
Set
endpoint_overrideto MetalLB VIP in conductor[service_catalog].Uses
ansible.utils.ipwrapfor IPv6 bracket wrapping (RFC 3986).
Conductor dnsmasq missing UEFI boot directives (OSPRH-28474):
ironic-operator conductor template lacks
dhcp-bootlines (inspectortemplate has them). Workaround patches
dnsmasq.confat runtime,gated by
ironic_conductor_dnsmasq_uefi_fixtoggle (default:true).Applied in both
ironic_adoptionandnova_verifybecause operatorreconciliation can wipe the first patch (OSPRH-28479).
Conductor dnsmasq DHCP race (OSPRH-28474):
Deployed instances get wrong IP from conductor pool (OVN doesn't serve
vif_type=otherports). Workaround addsdhcp-hostentry and rebootsinstance. Idempotent (skips if MAC already in config).
Deploy images cleared with no fallback (OSPRH-28476):
Sets httpboot IPA deploy images on nodes using conductor pod IP.
Old code blanked
deploy_ramdisk/deploy_kernelwith no replacement.Additional fixes:
(soft-delete via
UPDATE services SET deleted=idwhere version < MAX)wait_node_stateto use--provision-stateserver-side filter(replaces fragile
grep -Pwith escaped column names)wait_image_activeto use${image_name}param (was hardcodedto
Fedora-Cloud-Base-38)wait_server_active(early exit vs timeout)Tested - full adoption pipeline (7/7 stages, ~10h):
deploy-infra → deploy-osp → deploy-ocp → install-operators →
install-shiftstack → run-adoption → run-shiftstack-after
(buildset 9821f411)
Bugs filed:
Related-Issue: OSPRH-27919