Recommendation about network setup between 3 Servers

Hello there,

I have three servers. They are running on NixOS 20.09. They all have a dual-port 10Gbit Card installed and the Links are patched in a ring between the three. So every server can reach the others using a direct link.

Now I want to use these links for fast connectivity between the hosts used for ceph which is running on them and have some vlan-aware bridges to bind virtual machines to.

Here are my problems:

  • As all servers see all servers it is basically a loop, so if using layer2 I need (r)STP to block one of the links
    • (r)STP works fine with standard linux bridges but it is a pain to get them vlan aware and configure the vlans on my kvm virtual machines
    • I’ve tried to use OpenVswitch for easy configuration but it seems the (r)STP implementation is not really tested and I’ve got massive problems with it. The configuration of the VLAN’s in the VM’s was really convenient.

At the moment I am using a VLAN aware bridge configured by systemd-networkd and for each VM I create a VETH pair on the host machines to add the VLAN ID.
This looks like this:

  systemd.network.netdevs."20-br-ring" = {
    netdevConfig = { 
      Kind = "bridge";
      Name = "br-ring";
    };  
    extraConfig = ''
      [Bridge]
      STP=true
      VLANFiltering=true
    '';
  };  
  systemd.network.networks."30-int-enp3s0f0" = { 
    matchConfig = { 
      Name = "enp3s0f0";
    };  
    networkConfig = { Bridge = "br-ring"; };
  };  
  systemd.network.networks."30-int-enp3s0f1" = { 
    matchConfig = { 
      Name = "enp3s0f1";
    };  
    networkConfig = { Bridge = "br-ring"; };
  };  
  systemd.network.networks."30-int-ring" = { 
    matchConfig = { 
      Name = "br-ring";
    };  
    address = [ 
      "10.42.13.1/24"
    ];  
  };  

  # VM Network mon0
  systemd.network.netdevs."40-veth-vlan100-mon0" = { 
    netdevConfig = { 
      Kind = "veth";
      Name = "br-mon0-v100";
    };  
    peerConfig = { 
      Name = "pe-mon0-v100";
    };  
  };  
  systemd.network.networks."45-veth-mon0" = { 
    matchConfig = { 
      Name = "pe-mon0-v100";
    };  
    networkConfig = { Bridge = "br-ring"; };
    linkConfig = { RequiredForOnline = false; };
    extraConfig = ''
      [BridgeVLAN]
      VLAN=100
    '';
  };  

This configuration seem to work (when ignoring this problem: bridge is not working after fresh boot but has the following flaws:

  • A a big chunk of configuration for each VM needed
  • It is not very flexible and it does not seem to do “atomic reloads” when adding new VM’s to the bridge so it affects the other VM’s.

So finally my question: Does someone have a setup that is kind of similar to mine and has a fancy network configuration which I had not thought of?
Does someone has (r)STP working on OpenVswitch without problems?
Could it be that all my problems (with OpenVswitch) and also the link above are comming from crappy drivers for my 10Gbit cards (NetXen Incorporated NX3031 Multifunction 1/10-Gigabit Server Adapter (rev 42))?

Thanks a lot and sorry for the WoT
best regards,
Stefan

Here are my problems:

  • As all servers see all servers it is basically a loop, so if using layer2 I need (r)STP to block one of the links
    • (r)STP works fine with standard linux bridges but it is a pain to get them vlan aware and configure the vlans on my kvm virtual machines
    • I’ve tried to use OpenVswitch for easy configuration but it seems the (r)STP implementation is not really tested and I’ve got massive problems with it. The configuration of the VLAN’s in the VM’s was really convenient.

This is all a static configuration? Do you need the VMs to L2-see each other at all? My first instinct would be to give each server a subnet for VMs, add a couple or maybe four routing entries on each server, and let each server route for its VMs. Also won’t lose a link this way, I guess (at least for constant jumbo-frame flow).

Yes, I’ve also thought about that. But then I loose the ability to migrate VM’s between the three nodes on the fly. The three nodes should provide software define storage for all VM’s and should be able to take them over on the fly for easy maintenance of one node after another without stopping the VM’s.

Another possible solution was using routing-damons like bird or quagger on the Hosts and also inside every VM and let them do BGP magic. This solves the on the fly migration but does not work with most of my already existing VM’s.

Yes, I’ve also thought about that. But then I loose the ability to migrate VM’s between the three nodes on the fly. The three nodes should provide software define storage for all VM’s and should be able to take them over on the fly for easy maintenance of one node after another without stopping the VM’s.

I guess you could also have each VM have two interfaces, with the same IP and the same subnet, and let ARP sort out all that. But that requires reconfiguring the VMs relative to the current state.

Maybe you could just have all three links up and firewall-block forwarding between the physical links?