Skip to content

[BUGFIX] Fix ironic conductor for post-adoption baremetal provisioning#1302

Closed
imatza-rh wants to merge 1 commit intoopenstack-k8s-operators:downstreamfrom
imatza-rh:add-conductor-dhcp-ranges
Closed

[BUGFIX] Fix ironic conductor for post-adoption baremetal provisioning#1302
imatza-rh wants to merge 1 commit intoopenstack-k8s-operators:downstreamfrom
imatza-rh:add-conductor-dhcp-ranges

Conversation

@imatza-rh
Copy link
Copy Markdown

@imatza-rh imatza-rh commented Mar 10, 2026

Fix issues preventing post-adoption baremetal provisioning in the
test-with-ironic scenario. Contains workarounds for ironic-operator
bugs and related fixes. Discovered during shiftstack adoption CI
(OSPRH-27919),
but all fixes affect regular ironic adoption too.

Fixes:

  1. Missing conductor DHCP ranges (OSPRH-28474):
    ironicConductors had no dhcpRanges, nodes couldn't PXE boot.
    Added 172.20.1.240-249 (non-overlapping with inspector 190-199).

  2. IPA heartbeat using unreachable K8s hostname (OSPRH-28474):
    Set endpoint_override to MetalLB VIP in conductor [service_catalog].
    Uses ansible.utils.ipwrap
    for IPv6 bracket wrapping (RFC 3986).

  3. Conductor dnsmasq missing UEFI boot directives (OSPRH-28474):
    ironic-operator conductor template lacks dhcp-boot lines (inspector
    template has them). Workaround patches dnsmasq.conf at runtime,
    gated by ironic_conductor_dnsmasq_uefi_fix toggle (default: true).
    Applied in both ironic_adoption and nova_verify because operator
    reconciliation can wipe the first patch (OSPRH-28479).

  4. Conductor dnsmasq DHCP race (OSPRH-28474):
    Deployed instances get wrong IP from conductor pool (OVN doesn't serve
    vif_type=other ports). Workaround adds dhcp-host entry and reboots
    instance. Idempotent (skips if MAC already in config).

  5. Deploy images cleared with no fallback (OSPRH-28476):
    Sets httpboot IPA deploy images on nodes using conductor pod IP.
    Old code blanked deploy_ramdisk/deploy_kernel with no replacement.

Additional fixes:

  • Clean stale nova-compute services blocking FFU version convergence
    (soft-delete via UPDATE services SET deleted=id where version < MAX)
  • Wait for source Keystone availability before pre-launch tests
  • Fix wait_node_state to use --provision-state server-side filter
    (replaces fragile grep -P with escaped column names)
  • Fix wait_image_active to use ${image_name} param (was hardcoded
    to Fedora-Cloud-Base-38)
  • Add ERROR state detection in wait_server_active (early exit vs timeout)
  • IPv6 URL bracket wrapping and CONDUCTOR_IP guards

Tested - full adoption pipeline (7/7 stages, ~10h):
deploy-infra → deploy-osp → deploy-ocp → install-operators →
install-shiftstack → run-adoption → run-shiftstack-after
(buildset 9821f411)

Bugs filed:

  • OSPRH-28474 - ironic-operator conductor dnsmasq template gaps (Critical)
  • OSPRH-28476 - ironic_adoption clears deploy images with no fallback
  • OSPRH-28478 - Ironic adoption doc missing ML2 and conductor dhcpRanges
  • OSPRH-28479 - RFE: CR field for custom conductor dnsmasq directives

Related-Issue: OSPRH-27919

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Mar 10, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign sathlan for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@imatza-rh imatza-rh marked this pull request as draft March 10, 2026 19:15
@imatza-rh imatza-rh changed the title [BUGFIX] Add dhcpRanges to ironicConductors [BUGFIX] Fix ironic conductor for post-adoption baremetal provisioning Mar 11, 2026
@imatza-rh imatza-rh force-pushed the add-conductor-dhcp-ranges branch 5 times, most recently from 32956ab to e6d940f Compare March 16, 2026 19:46
@imatza-rh imatza-rh force-pushed the add-conductor-dhcp-ranges branch from e6d940f to aed358b Compare March 16, 2026 20:47
@imatza-rh imatza-rh force-pushed the add-conductor-dhcp-ranges branch 6 times, most recently from 2e5ab58 to 30ba585 Compare March 23, 2026 12:43
@imatza-rh imatza-rh force-pushed the add-conductor-dhcp-ranges branch 2 times, most recently from 58d655e to ee48361 Compare March 29, 2026 18:53
Fix issues preventing post-adoption baremetal provisioning in the
test-with-ironic scenario:

1. Missing conductor DHCP ranges (OSPRH-28474): ironicConductors
   had no dhcpRanges, so nodes couldn't PXE boot.

2. IPA heartbeat using unreachable K8s hostname (OSPRH-28474):
   Set endpoint_override to MetalLB VIP in conductor service.

3. Conductor dnsmasq missing UEFI boot directives (OSPRH-28474):
   ironic-operator conductor template lacks dhcp-boot lines.
   Workaround patches dnsmasq.conf at runtime, gated by toggle.

4. Conductor dnsmasq DHCP race (OSPRH-28474): deployed instances
   get wrong IP from conductor pool. Workaround adds dhcp-host
   entry and reboots instance.

5. Deploy images cleared with no fallback (OSPRH-28476): set
   httpboot IPA deploy images on nodes after clearing.

Additional fixes:
- Clean stale nova-compute services blocking FFU version convergence
- Wait for source Keystone availability before pre-launch tests
- Fix SSH quoting in pre_launch_ironic.bash wait functions
- Add IPv6 URL bracket wrapping (RFC 3986) and CONDUCTOR_IP guards
- Detect ERROR state in wait_server_active (early exit vs timeout)

OSPRH-27919

Assisted-By: Claude Code
Signed-off-by: Itay Matza <imatza@redhat.com>
@imatza-rh imatza-rh force-pushed the add-conductor-dhcp-ranges branch from ee48361 to bab1446 Compare March 29, 2026 18:58
@imatza-rh imatza-rh marked this pull request as ready for review March 30, 2026 10:57
- {{ ironic_network }}
provisionNetwork: {{ ironic_network }}
# OSPRH-28478: conductor dhcpRanges required for deploy/clean
dhcpRanges:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This DHCP is for the standalone ironic - not for full RHOSO.
Neutron is providing DHCP for ironic provisioning.

Copy link
Copy Markdown
Contributor

@juliakreger juliakreger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of changes, I'd like to request we schedule a meeting with the hardprov team to discuss this. We are concerned features are getting confused because DHCP for baremetal nodes should be provided by neutron, and standalone direct usage of the Ironic deployment is not an expected nor directly supported scenario outside of the integrated use case. i.e. Asking ironic to deploy a baremetal node with VIF would work and be supported, but operating ironic without neutron would not be inherently supported.

imatza-rh added a commit to imatza-rh/data-plane-adoption that referenced this pull request Mar 30, 2026
Test branch: removes conductor dhcpRanges, dnsmasq UEFI patch
tasks, and the toggle variable. Keeps all other PR openstack-k8s-operators#1302 fixes.

Purpose: verify whether OVN handles baremetal DHCP on the
provisioning network without conductor dnsmasq involvement.

NOT FOR MERGE - test only.

Assisted-By: Claude Code
Signed-off-by: Itay Matza <imatza@redhat.com>
@imatza-rh
Copy link
Copy Markdown
Author

Closing - our CI pipeline passes without these workarounds (buildset 23765d65).

@imatza-rh imatza-rh closed this Apr 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants