Issues using nixos-container to set up an etcd cluster

Ma27 · August 4, 2020, 1:46pm

Yeah this is actually a known issue with NixOS containers.
During a container’s startup the container-side network (using veth-pairs) is configured and network-online.target is also reached (which is why systemd thinks it has a working connection), however the uplink is broken until the container is fully started up since the host-side interface will be configured in the ExecStartPost-hook of the corresponding systemd-unit:

github.com

NixOS/nixpkgs/blob/b421e80b744c8e9fd7c732eebbb9bcf43b478461/nixos/modules/virtualisation/nixos-containers.nix#L178-L229


      
          postStartScript = (cfg:
            let
              ipcall = cfg: ipcmd: variable: attribute:
                if cfg.${attribute} == null then
                  ''
                    if [ -n "${variable}" ]; then
                      ${ipcmd} add ${variable} dev $ifaceHost
                    fi
                  ''
                else
                  ''${ipcmd} add ${cfg.${attribute}} dev $ifaceHost'';
              renderExtraVeth = name: cfg:
                if cfg.hostBridge != null then
                  ''
                    # Add ${name} to bridge ${cfg.hostBridge}
                    ip link set dev ${name} master ${cfg.hostBridge} up
                  ''
                else
                  ''
                    echo "Bring ${name} up"

This file has been truncated. show original

I stumbled upon this several times in the past[1][2] and unless you do some ugly workarounds (like delaying the startup of etcd inside the unit until you have working networking), you probably want to use something else than nixos-containers.

About a year ago I tried to investigate this issue more thorougly and realized that this can only be fixed for good when using systemd-networkd(8). An issue has been opened for that[3]. However this will cause a bigger impact which is why I decided to write an RFC first (I’m already on it, but it isn’t published yet).

[1] Services depending on `keys.target` can cause hanging boots on NixOS containers · Issue #67265 · NixOS/nixpkgs · GitHub
[2] nixos/acme: don't depend on multi-user.target inside a container by Ma27 · Pull Request #83704 · NixOS/nixpkgs · GitHub / nixos/acme: renew after rebuild and on boot by mweinelt · Pull Request #81371 · NixOS/nixpkgs · GitHub
[3] Implement NixOS container networking with networkd · Issue #69414 · NixOS/nixpkgs · GitHub