Fix isolated network boot race and subnet conflict (issue #94)#207
Fix isolated network boot race and subnet conflict (issue #94)#207kasturinarra merged 1 commit intomicroshift-io:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughRuns the isolated-network config script while capturing its exit code; stops greenboot and MicroShift before network changes; applies fixed 10.44 service-CIDR route (and 10.43 route when kindnet present) via nmcli, restarts NetworkManager, enables MicroShift on boot (no immediate restart), removes temp script, and restarts the node container after successful config. On config failure, prints an error, leaves the container running, and exits with the script's code. Changes
Sequence Diagram(s)sequenceDiagram
participant CM as ClusterManager (cluster_manager.sh)
participant CI as ConfigScript (config_isolated_net.sh)
participant NM as NetworkManager
participant MS as MicroShift
participant GB as greenboot-healthcheck
participant Container as Node Container
CM->>GB: stop (ignore errors)
CM->>MS: stop (ignore errors)
CM->>CI: execute temp script (capture config_rc)
CI->>MS: perform cleanup (microshift-cleanup-data --all)
CI->>NM: apply svc CIDR (10.44) & nmcli persistent route(s)
CI->>NM: add persistent 10.43.0.0/16 route if kindnet present
CI->>NM: restart NetworkManager
CI->>MS: enable MicroShift on-boot (no immediate restart)
CM->>CM: remove temp script
alt config_rc != 0
CM->>CM: echo error, leave container running
CM-->>CM: exit with config_rc
else config_rc == 0
CM->>Container: restart node container
Container->>GB: boot -> run greenboot health checks
Container->>MS: boot -> MicroShift starts
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip You can enable review details to help with troubleshooting, context usage and more.Enable the |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/cluster_manager.sh`:
- Around line 224-233: The isolated-network reconfiguration currently runs
/tmp/config_isolated_net.sh directly which, under set -euo pipefail, will abort
the script and skip cleanup/restart if it fails; change the sequence around the
podman exec of /tmp/config_isolated_net.sh so you capture its exit code (e.g.,
run podman exec -i "${node_name}" /tmp/config_isolated_net.sh and save $? into a
variable), always run the cleanup steps (podman exec rm -vf
/tmp/config_isolated_net.sh) and podman restart "${node_name}", and only after
restart exit the outer script with the captured code (return/exit the saved
status) so node services are restarted even on failure of config_isolated_net.sh
while preserving the original failure status.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: a51fed0f-13c0-45ff-b1af-b82281f21e6e
📒 Files selected for processing (2)
src/cluster_manager.shsrc/config_isolated_net.sh
ca56834 to
c0d2f23
Compare
c0d2f23 to
0a1dccd
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/config_isolated_net.sh (1)
65-72: Both branches use the same subnet—consider simplifying.The
if/elsebranches both callconfigure_offline_network "10.44". The only difference is the route addition for kindnet. This could be cleaner:♻️ Suggested simplification
if rpm -q microshift-kindnet &>/dev/null; then - configure_offline_network "10.44" # Add a persistent route for the service CIDR via loopback so that # ClusterIP traffic reaches the kube-proxy iptables rules. nmcli conn modify stable-microshift +ipv4.routes "10.43.0.0/16" -else - configure_offline_network "10.44" fi + +configure_offline_network "10.44"Wait—that would break ordering since
configure_offline_networkcreates the connection. Ignore if the route must be added after connection creation but before other steps.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/config_isolated_net.sh` around lines 65 - 72, Both branches call configure_offline_network "10.44", so simplify by calling configure_offline_network once unconditionally, then keep the rpm -q microshift-kindnet check only for adding the route; specifically, call configure_offline_network("10.44") first (this is the function that creates the connection), then if rpm -q microshift-kindnet succeeds run the nmcli conn modify stable-microshift +ipv4.routes "10.43.0.0/16" command to add the persistent route so ordering is preserved (create connection before modifying it).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/config_isolated_net.sh`:
- Around line 65-72: Both branches call configure_offline_network "10.44", so
simplify by calling configure_offline_network once unconditionally, then keep
the rpm -q microshift-kindnet check only for adding the route; specifically,
call configure_offline_network("10.44") first (this is the function that creates
the connection), then if rpm -q microshift-kindnet succeeds run the nmcli conn
modify stable-microshift +ipv4.routes "10.43.0.0/16" command to add the
persistent route so ordering is preserved (create connection before modifying
it).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: bf8a5909-c97e-4d55-a275-fe8eb2d4377d
📒 Files selected for processing (2)
src/cluster_manager.shsrc/config_isolated_net.sh
🚧 Files skipped from review as they are similar to previous changes (1)
- src/cluster_manager.sh
659a2c1 to
94a7ac8
Compare
181874e to
ed9c634
Compare
e297c7d to
a0d2277
Compare
b8a0805 to
765c1b4
Compare
…io#94) Stop MicroShift and greenboot before network reconfiguration to prevent a race condition where greenboot health checks fail during the config change. Restart the container afterward so greenboot runs naturally on boot. Move node IPs outside the service CIDR (10.43.0.0/16) to avoid an iptables conflict where the router-default LoadBalancer claims the same IP as the kubernetes ClusterIP, causing API server traffic to be routed to the router instead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
0cb5cbf to
d4d5e60
Compare
Closes #94
Stop MicroShift and greenboot before network reconfiguration to prevent a race condition where greenboot health checks fail during the config change. Restart the container afterward so greenboot runs naturally on boot.
Move node IPs outside the service CIDR (10.43.0.0/16) to avoid an iptables conflict where the router-default LoadBalancer claims the same IP as the kubernetes ClusterIP, causing API server traffic to be routed to the router instead.
Summary by CodeRabbit
New Features
Bug Fixes