Your experiences with NixOS and netbird/headscale/

I welcome all feedback.

I know little about them. I see 2 advantages for netbird:

  • netbird is EU based
  • The headscale control server needs Tailscale’s goodwill to keep their clients compatible.

But I don’t see active NixOS-netbird configs in the wild.

My setup: flake / NixOS / home-manager

2 main networks:

  • home network:
    • router
    • desktop
    • home server
  • remote network with a few subnets:
    • dashboard kiosk 1
    • dashboard kiosk 2
    • dashboard kiosk 3
  • laptop: wifi/ethernet
  • phone: cellular/ethernet (grapheneos)

Deploy / nixos-rebuild switch:

Currently only the laptop can deploy. In the future I’d also deploy from the home-server.
I always build on the laptop, then deploy to the other devices using --target-host.
All devices have mdns (.local) enabled, which I use to deploy using my own script.
To update the dashboards, currently I go to the remote network and connect my laptop.

I’d like to run a netbird/headscale control service on the home-router, all other devices would run a client.

Who is running a control server at home?

My fresh install procedure:

  • newdevice: boot from thumbdrive with custom iso (ssh keys, mdns enabled, …)
    • laptop: ssh into live iso, copy disk-id
    • laptop: add newdevice config to flake, set disk-id
    • laptop: run install script (extract hardware-configuration.nix, disko build-copy-execute, extract ssh keys, agenix rekey, system build-copy-install, unmount, reboot)
part of my install script (remote)
    version_in_iso=$(ssh_ '( grep "VERSION_ID=" < /etc/os-release | cut -d '=' -f2 | xargs )')
    exit_when_stateversion_between_iso_and_flake_system_mismatch "$hostname" "$version_in_iso"

    .log $L_5NOTICE " > Extract hardware-configuration.nix:"
    ssh_ '( nixos-generate-config --no-filesystems --root /mnt --show-hardware-config )' > "${LOCAL_CONFIG_PATH}/hosts/${hostname}/hardware-configuration.nix"
    nixfmt "${LOCAL_CONFIG_PATH}/hosts/${hostname}/hardware-configuration.nix"
    .log $L_5NOTICE " > Extract machine-id:"
    rsync -ah "${dest}:/etc/machine-id" "${LOCAL_CONFIG_PATH}/hosts/${hostname}/"

    .log $L_5NOTICE " > Build disko config:"
    # disko_script=$(nix_build "${flake}#nixosConfigurations.\"${flakeAttr}\".config.system.build.diskoScript")
    disko_script=$(nix_build "${LOCAL_CONFIG_PATH}#nixosConfigurations.${hostname}.config.system.build.diskoScript")
    .log $L_5NOTICE " > Size disko config:"
    nix path-info -Sh "$disko_script"
    # nix_copy --to "ssh://$ssh_connection" "$disko_script"
    .log $L_5NOTICE " > Copy disko config to remote:"
    nix_copy --to "ssh://$dest?remote-store=local?root=/mnt" "$disko_script"
    nix_copy --to "ssh://$dest" "$disko_script"
    # nix copy --to "ssh://$dest?remote-store=local?root=/mnt" "$sys" |& nom
    # NIX_SSHOPTS='-o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no' nix copy --to "ssh://${dest}" "$disko"
    # # NIX_SSHOPTS='-o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no' nix copy --to "ssh://${dest}?remote-store=local?root=/mnt" "$disko"
    .log $L_5NOTICE " > Execute disko: partition, format, mount:"
    ssh_ "$disko_script"

    .log $L_5NOTICE " > Extract hardware-configuration.nix:"
    ssh_ '( nixos-generate-config --no-filesystems --root /mnt --show-hardware-config )' > "${LOCAL_CONFIG_PATH}/hosts/${hostname}/hardware-configuration.nix"
    nixfmt "${LOCAL_CONFIG_PATH}/hosts/${hostname}/hardware-configuration.nix"
    # NIX_SSHOPTS='-o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no' nix copy --to "ssh://${dest}" "$disko"
    # # NIX_SSHOPTS='-o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no' nix copy --to "ssh://${dest}?remote-store=local?root=/mnt" "$disko"
    # # NIX_SSHOPTS='-o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no' nix-copy-closure --to "$dest" "$disko"
    nixfmt "${LOCAL_CONFIG_PATH}/hosts/${hostname}/hardware-configuration.nix"

    ssh_ '( mount | grep /mnt )'

    luks_remote_unlock_enabled="$(nix eval "${LOCAL_CONFIG_PATH}#nixosConfigurations.${hostname}.config.ncfg.disko.luks_remote_unlock" "${OVERRIDE_INPUT[@]}" | xargs)"
    if [ "$luks_remote_unlock_enabled" = true ]; then
      ## [ "/persist/etc/secrets/initrd/host_ed25519_key" ]
      init_hostkey_path="/mnt$(nix eval "${LOCAL_CONFIG_PATH}#nixosConfigurations.${hostname}.config.boot.initrd.network.ssh.hostKeys" "${OVERRIDE_INPUT[@]}" | tr -d '[]' | xargs)"
      ## /mnt/persist/etc/secrets/initrd/host_ed25519_key
      init_hostkey_dir="$(dirname "${init_hostkey_path}")"
      .log $L_5NOTICE " > ssh init: Execute mkdir ${init_hostkey_dir}:"
      ssh_ "( mkdir -p ${init_hostkey_dir} )"
      .log $L_5NOTICE " > ssh init: Execute ssh-keygen ${init_hostkey_path}:"
      ssh_ "( ssh-keygen -t ed25519 -N '' -f ${init_hostkey_path} )"
    fi

    .log $L_5NOTICE " > Execute mkdir /mnt/etc/ssh:"
    ssh_ "( mkdir -p /mnt/etc/ssh )"
    .log $L_5NOTICE " > Execute mkdir /mnt/persist/etc/ssh:"
    ssh_ "( mkdir -p /mnt/persist/etc/ssh )"
    .log $L_5NOTICE " > Extract ssh key.pub:"
    rsync -ah "${dest}:/etc/ssh/ssh_host_ed25519_key.pub" "${LOCAL_CONFIG_PATH}/hosts/${hostname}/key.pub"
    .log $L_5NOTICE " > Execute cp ssh keys from iso to /mnt:"
    ssh_ "( cp /etc/ssh/ssh_host_ed25519_key* /mnt/etc/ssh/ )"
    .log $L_5NOTICE " > Execute cp ssh keys from iso to /mnt/persist:"
    ssh_ "( cp /etc/ssh/ssh_host_ed25519_key* /mnt/persist/etc/ssh/ )"
    .log $L_5NOTICE " > Rekey agenix:"
    cd "${LOCAL_CONFIG_PATH}"
    agenix --rekey
    git add "${LOCAL_CONFIG_PATH}/secrets"
    git add "${LOCAL_CONFIG_PATH}/hosts/${hostname}"
    git commit -m "add host ${hostname}"

    .log $L_5NOTICE " > system eval"
    sys="$(nix eval --raw "${LOCAL_CONFIG_PATH}#nixosConfigurations.${hostname}.config.system.build.toplevel" "${OVERRIDE_INPUT[@]}")"
    .log $L_5NOTICE " > system build"
    nix build "${LOCAL_CONFIG_PATH}#nixosConfigurations.${hostname}.config.system.build.toplevel" --out-link "$(mktemp -d)/result" |& nom
    .log $L_5NOTICE " > system size"
    nix path-info -Sh "$sys"
    .log $L_5NOTICE " > system copy"
    nix copy --to "ssh://$dest?remote-store=local?root=/mnt" "$sys" |& nom
    .log $L_5NOTICE " > system nixos-install"
    ssh_ << SSH
        nixos-install --no-root-passwd --no-channel-copy --system "$sys"
SSH

    unmount_and_export_pool "$dest"

But is it possible to deploy to a fresh device declaratively?
I could adapt the install script. But I don’t want to manually accept new devices on the control server. Is this possible to automate?

Future steps:

  • mullvad integration for everything, except home-server connections
  • luks unlock over netbird/tailscale. already functional on local network.
1 Like

I ran a bit of netbird testing it out. It was fine until it wasn’t, I changed nothing and my clients all fell off their management plane. I’ve since kicked it and am in the process of rolling out headscale. Netbird is much more feature rich than I need and that leads to some complexity.

A confounding factor for me is complex networking needs. I think headscale will be sufficient as it does all of what I want and nothing that I don’t.

You are right about potential rug pulls on tailscale, if that happens then I’m pinning the working client I have until moving to some other solution.

RE: the automation, you could potentially add the CLI commands to register the nodes but you’ll have to periodically update that key and protect it somehow. But getting devices onto netbird cloud with a simple CLI command was a cinch after having the very basic configs to install netbird and enable the client service.

1 Like

I do it exactly the same as you do. I use headscale.

  1. I create a new key in headscale: headscale preauthkeys create --user <USER_ID>
  2. I store the created key in my sops.yaml
  3. The tricky part is creating/ preserving the ssh keys and add them to your sops configuration
  4. Read the auth key from sops services.tailscale. authKeyFile = config.sops.secrets."services/headscale/preauthkey".path;

This is not fully automatic because you have to manually create the auth key but I keep it like this for security reasons. You theoretically could specify a really long life time of the auth key though.

1 Like