Skip to content

Fix isolated network boot race and subnet conflict (issue #94)#207

Merged
kasturinarra merged 1 commit intomicroshift-io:mainfrom
kasturinarra:fix_issue_94
Mar 17, 2026
Merged

Fix isolated network boot race and subnet conflict (issue #94)#207
kasturinarra merged 1 commit intomicroshift-io:mainfrom
kasturinarra:fix_issue_94

Conversation

@kasturinarra
Copy link
Contributor

@kasturinarra kasturinarra commented Mar 5, 2026

Closes #94

Stop MicroShift and greenboot before network reconfiguration to prevent a race condition where greenboot health checks fail during the config change. Restart the container afterward so greenboot runs naturally on boot.

Move node IPs outside the service CIDR (10.43.0.0/16) to avoid an iptables conflict where the router-default LoadBalancer claims the same IP as the kubernetes ClusterIP, causing API server traffic to be routed to the router instead.

Summary by CodeRabbit

  • New Features

    • Adds persistent routing for the service CIDR, ensures NetworkManager restarts, enables MicroShift to start on boot, and triggers a container restart after successful reconfiguration to run health checks.
  • Bug Fixes

    • Stops health checks before applying network changes, performs safer execution and cleanup of configuration steps, replaces an immediate service restart with an instruction to reboot to start services, and explicitly propagates configuration failures with an error and leaves the container running for troubleshooting.

@kasturinarra kasturinarra requested a review from a team as a code owner March 5, 2026 08:13
@coderabbitai
Copy link

coderabbitai bot commented Mar 5, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Runs the isolated-network config script while capturing its exit code; stops greenboot and MicroShift before network changes; applies fixed 10.44 service-CIDR route (and 10.43 route when kindnet present) via nmcli, restarts NetworkManager, enables MicroShift on boot (no immediate restart), removes temp script, and restarts the node container after successful config. On config failure, prints an error, leaves the container running, and exits with the script's code.

Changes

Cohort / File(s) Summary
Cluster manager workflow
src/cluster_manager.sh
Run isolated-network config script capturing its exit code into a local config_rc, always remove the temp script, on non-zero config_rc echo error and leave container running then exit with that code; on success restart node container to trigger health checks.
Network configuration script
src/config_isolated_net.sh
Stop greenboot-healthcheck before network reconfiguration; use fixed 10.44 service CIDR and add persistent nmcli route(s); add persistent 10.43.0.0/16 route when kindnet present; restart NetworkManager; enable MicroShift on boot (remove immediate restart) and echo instruction to reboot container/system to start MicroShift/greenboot.

Sequence Diagram(s)

sequenceDiagram
    participant CM as ClusterManager (cluster_manager.sh)
    participant CI as ConfigScript (config_isolated_net.sh)
    participant NM as NetworkManager
    participant MS as MicroShift
    participant GB as greenboot-healthcheck
    participant Container as Node Container

    CM->>GB: stop (ignore errors)
    CM->>MS: stop (ignore errors)
    CM->>CI: execute temp script (capture config_rc)
    CI->>MS: perform cleanup (microshift-cleanup-data --all)
    CI->>NM: apply svc CIDR (10.44) & nmcli persistent route(s)
    CI->>NM: add persistent 10.43.0.0/16 route if kindnet present
    CI->>NM: restart NetworkManager
    CI->>MS: enable MicroShift on-boot (no immediate restart)
    CM->>CM: remove temp script
    alt config_rc != 0
        CM->>CM: echo error, leave container running
        CM-->>CM: exit with config_rc
    else config_rc == 0
        CM->>Container: restart node container
        Container->>GB: boot -> run greenboot health checks
        Container->>MS: boot -> MicroShift starts
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • ggiguash
  • pmtk
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately reflects the main changes: fixing a boot race condition in isolated network setup and resolving a subnet conflict (issue #94).
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

You can enable review details to help with troubleshooting, context usage and more.

Enable the reviews.review_details setting to include review details such as the model used, the time taken for each step and more in the review comments.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/cluster_manager.sh`:
- Around line 224-233: The isolated-network reconfiguration currently runs
/tmp/config_isolated_net.sh directly which, under set -euo pipefail, will abort
the script and skip cleanup/restart if it fails; change the sequence around the
podman exec of /tmp/config_isolated_net.sh so you capture its exit code (e.g.,
run podman exec -i "${node_name}" /tmp/config_isolated_net.sh and save $? into a
variable), always run the cleanup steps (podman exec rm -vf
/tmp/config_isolated_net.sh) and podman restart "${node_name}", and only after
restart exit the outer script with the captured code (return/exit the saved
status) so node services are restarted even on failure of config_isolated_net.sh
while preserving the original failure status.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a51fed0f-13c0-45ff-b1af-b82281f21e6e

📥 Commits

Reviewing files that changed from the base of the PR and between 066705a and ca56834.

📒 Files selected for processing (2)
  • src/cluster_manager.sh
  • src/config_isolated_net.sh

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/config_isolated_net.sh (1)

65-72: Both branches use the same subnet—consider simplifying.

The if/else branches both call configure_offline_network "10.44". The only difference is the route addition for kindnet. This could be cleaner:

♻️ Suggested simplification
 if rpm -q microshift-kindnet &>/dev/null; then
-  configure_offline_network "10.44"
   # Add a persistent route for the service CIDR via loopback so that
   # ClusterIP traffic reaches the kube-proxy iptables rules.
   nmcli conn modify stable-microshift +ipv4.routes "10.43.0.0/16"
-else
-  configure_offline_network "10.44"
 fi
+
+configure_offline_network "10.44"

Wait—that would break ordering since configure_offline_network creates the connection. Ignore if the route must be added after connection creation but before other steps.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/config_isolated_net.sh` around lines 65 - 72, Both branches call
configure_offline_network "10.44", so simplify by calling
configure_offline_network once unconditionally, then keep the rpm -q
microshift-kindnet check only for adding the route; specifically, call
configure_offline_network("10.44") first (this is the function that creates the
connection), then if rpm -q microshift-kindnet succeeds run the nmcli conn
modify stable-microshift +ipv4.routes "10.43.0.0/16" command to add the
persistent route so ordering is preserved (create connection before modifying
it).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/config_isolated_net.sh`:
- Around line 65-72: Both branches call configure_offline_network "10.44", so
simplify by calling configure_offline_network once unconditionally, then keep
the rpm -q microshift-kindnet check only for adding the route; specifically,
call configure_offline_network("10.44") first (this is the function that creates
the connection), then if rpm -q microshift-kindnet succeeds run the nmcli conn
modify stable-microshift +ipv4.routes "10.43.0.0/16" command to add the
persistent route so ordering is preserved (create connection before modifying
it).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: bf8a5909-c97e-4d55-a275-fe8eb2d4377d

📥 Commits

Reviewing files that changed from the base of the PR and between c0d2f23 and 659a2c1.

📒 Files selected for processing (2)
  • src/cluster_manager.sh
  • src/config_isolated_net.sh
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/cluster_manager.sh

@kasturinarra kasturinarra force-pushed the fix_issue_94 branch 2 times, most recently from e297c7d to a0d2277 Compare March 11, 2026 09:41
@kasturinarra kasturinarra force-pushed the fix_issue_94 branch 4 times, most recently from b8a0805 to 765c1b4 Compare March 13, 2026 18:32
…io#94)

Stop MicroShift and greenboot before network reconfiguration to prevent
a race condition where greenboot health checks fail during the config
change. Restart the container afterward so greenboot runs naturally on
boot.

Move node IPs outside the service CIDR (10.43.0.0/16) to avoid an
iptables conflict where the router-default LoadBalancer claims the same
IP as the kubernetes ClusterIP, causing API server traffic to be routed
to the router instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@kasturinarra kasturinarra merged commit 96ec1e9 into microshift-io:main Mar 17, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix occasional CI failures due to TopoLVM not initializing properly

4 participants