NixOS on mirrored ssd: boot, swap, native encrypted zfs

I’ve recently been working on this as well. Following a disk failure in which I lost my boot partition, I decided to attempt a mirrored boot setup.

My setup is now as follows:

  • nixos 20.09
  • zfs 2.0 (boot.zfs.enableUnstable = true)
  • 2 identical drives in a mirror configuration. Each drive has a giant partition for ZFS and a 1 GB EFI boot partition
  • My zfs pool is a mirror, using zfs native encryption and contains my root and /home datasets.
  • Grub bootloader

Here are the relevant lines from my nixos configuration:

boot.loader.grub = {
    enable = true;
    zfsSupport = true;
    efiSupport = true;
    mirroredBoots = [
      {
        devices = [ "nodev" ];
        path = "/boot1";
      }
      {
        devices = [ "nodev" ];
        path = "/boot2";
      }
    ];
  };
 
  boot.loader.efi.canTouchEfiVariables = true;

  boot.supportedFilesystems = [ "zfs" ];
  boot.zfs.enableUnstable = true;
  
  # prevents "multiple pools with same name" problem during boot
  boot.zfs.devNodes = "/dev/disk/by-partuuid";

  fileSystems."/" =
    { device = "rpool/safe/root/nixos";
      fsType = "zfs";
    };

  fileSystems."/home" =
    { device = "rpool/safe/home";
      fsType = "zfs";
    };

  fileSystems."/nix" =
    { device = "rpool/nix";
      fsType = "zfs";
    };

  fileSystems."/boot1" =
    { device = "/dev/disk/by-uuid/B80C-173D";
      fsType = "vfat";
    };

  fileSystems."/boot2" =
    { device = "/dev/disk/by-uuid/9E23-5760";
      fsType = "vfat";
    };

  swapDevices = [ ];

This is working very well when both drives are installed. However, when I remove a drive and attempt to boot, I can’t get to a fully working system. Here’s what happens:

  1. grub works fine
  2. nixos stage 1 works fine (rpool imported)
  3. nixos stage 2 hangs for a long time trying to mount the missing /boot2
  4. eventually I get the option to enter the root password for emergency mode, or press ctrl-d to continue

If I press ctrl-d, it basically hangs again for a long time, and eventually puts me back to step 4.

If instead I enter the root password, I get a shell and can poke around. But if I try to systemctl start display-manager.service, the system hangs again, and eventually I’m back to step 4.

Is there some way for me to tell systemd that missing /boot2 is ok for now? That would be enough to make the mirrored booting worthwhile. In fact, having the system boot up as if everything is totally fine even though an entire drive is dead may not be desirable anyway.

1 Like