Skip to content

✨ feat: Implement HA Router VRRP, Multi-WAN, and Conntrack Sync (Group 2)#9

Open
ProjectInitiative wants to merge 1 commit intofeat/routerfrom
feat/masthead-ha-router-group-2-7255720421843975739
Open

✨ feat: Implement HA Router VRRP, Multi-WAN, and Conntrack Sync (Group 2)#9
ProjectInitiative wants to merge 1 commit intofeat/routerfrom
feat/masthead-ha-router-group-2-7255720421843975739

Conversation

@ProjectInitiative
Copy link
Owner

This pull request fully implements GROUP 2 of the HA Router architecture plan, providing robust High Availability and Multi-WAN failover logic.

🎯 What

  • Creates modules/nixos/hosts/masthead/vrrp/default.nix integrating the custom Keepalived VRRP wrapper.
  • Implements MAC spoofing shell scripts in the VRRP module to dynamically control WAN assignments when backup routers are promoted.
  • Creates modules/nixos/hosts/masthead/multi-wan/default.nix adding an automated ping-based health check that alters route metrics to achieve a multi-WAN failover design similar to mwan3.
  • Creates modules/nixos/hosts/masthead/conntrack/default.nix isolating state synchronization to a dedicated interface (vlan40) utilizing conntrackd's FTFW sync.
  • Refactors modules/nixos/hosts/masthead/default.nix removing the statically assigned lan0 VIP (which breaks HA logic) and instead applies distinct "real" role-based IPs (e.g. .2 and .3) to all primary and backup interfaces.

⚠️ Risk

  • Medium-Low: Modifying network routing directly via systemd scripts requires careful timing. The multi-wan health check runs every 10 seconds; short network blips may cause route flapping.

🛡️ Solution

  • The scripts and states have been designed exactly to the architecture plan, incorporating distinct subnets, proper iproute2 references, metric-based failover (so gateways are never completely erased from routing tables), and secure multicast bindings.

PR created automatically by Jules for task 7255720421843975739 started by @ProjectInitiative

This commit implements the requirements of GROUP 2 for the High Availability (HA) NixOS router plan (`0001-ha-router-setup-plan.md`):

1. **VRRP (`vrrp/default.nix`)**: Configures `Keepalived` VRRP instances for the local LAN/VLANs. It defines `notifyMaster` and `notifyBackup` scripts to handle dynamic WAN MAC spoofing and interface administrative state control on failover.
2. **Multi-WAN (`multi-wan/default.nix`)**: Implements a `multi-wan-healthcheck` systemd service that monitors external connectivity via ICMP. It dynamically adjusts the default route metric to shift traffic away from broken uplinks while preserving the gateway discovery capabilities for failback.
3. **Conntrack (`conntrack/default.nix`)**: Configures `conntrackd` FTFW (Failover/Fault-Tolerant Firewall) state synchronization securely over a dedicated `vlan40` interface, using multicast.
4. **Base Refactor (`masthead/default.nix`)**: Defines necessary `cfg` options for the MAC, VRRP Virtual IPs, and health check IPs. It properly provisions "real" dedicated IPs for all interfaces based on the specific node's `role` (primary/backup) to avoid IP conflicts and allow VRRP daemons to communicate.

Co-authored-by: ProjectInitiative <6314611+ProjectInitiative@users.noreply.github.com>
@google-labs-jules
Copy link
Contributor

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant