Unencrypted ZFS mirrored boot drive (and recovery)

What is working:

  • I can run an install script and create a mirrored ZFS boot volume in a virtual environment
  • I can then remove a volume, and re-create the ZFS mirrored boot volume

What I’m stuck on:

  • I don’t seem to be able to get both /boot and /boot-fallback to be persistent after recovery

The long story
I’m in the process of building out a new server, and while I’ve only lost one boot device in the last 20+ years, hardware is so cheap I thought why not go for a mirrored boot volume?

My choice of ZFS is more about sticking with as few filesystems types as I can, and I do plan for my main drive array to be ZFS based (RAIDZ).

Thus, I’m trying to build out a NixOS installation that will boot from a mirrored ZFS root drive, I’m also not trying to swim upstream too hard so I’m find with a grub based boot system to take advantage of the boot.loader.grub.mirroredBoots support.

While I have the actual hardware, I’ve been focused on building out a virtual environment which let’s me more rapidly iterate. Specifics here are UTM on a Mac M1 Pro, running the arm based version of NixOS.

My install script (based on this article)

# set up my disks by id
DISK1=/dev/disk/by-id/nvme-QEMU_NVMe_Ctrl_F1905C9F-0356-43B3-B
DISK2=/dev/disk/by-id/nvme-QEMU_NVMe_Ctrl_23AB37ED-C078-4AEF-8

# ran this 'micro-script'
partition() {
    sgdisk --zap-all "$1"
    sgdisk -n 1:0:+1GiB -t 1:EF00 -c 1:boot "$1"
    # Swap is omitted.
    sgdisk -n 2:0:0 -t 2:BF01 -c 2:zfs "$1"
    sgdisk --print "$1"
}

partition $DISK1
partition $DISK2

# formatted the EFI partitions
mkfs.vfat $DISK1-part1
mkfs.vfat $DISK2-part1

# created zfs pool - but skipped encrytion

zpool create \
    -o ashift=12 \
    -O mountpoint=none -O atime=off -O acltype=posixacl -O xattr=sa \
    -O compression=lz4 rpool mirror \
    $DISK1-part2 $DISK2-part2

# made root data set and an empty snapshot
zfs create -p -o mountpoint=legacy rpool/local/root
zfs snapshot rpool/local/root@blank

# mounted it 
mount -t zfs rpool/local/root /mnt

# and mounted the EFI paritions
mkdir /mnt/boot
mkdir /mnt/boot-fallback
mount $DISK1-part1 /mnt/boot
mount $DISK2-part1 /mnt/boot-fallback

# At this point I started configuration of nix

Before running this install script, I edit the first few lines to define DISK1 and DISK2 based on the system I’m installing it into.

After running the script I will run nixos-generate-config --root /mnt to create the hardware configuration for this machine.

Then I copy in my configuration.nix file which is below

# Edit this configuration file to define what should be installed on
# your system. Help is available in the configuration.nix(5) man page, on
# https://search.nixos.org/options and in the NixOS manual (`nixos-help`).

{ config, lib, pkgs, ... }:

{
  imports =
    [ # Include the results of the hardware scan.
      ./hardware-configuration.nix
    ];

  # Use the systemd-boot EFI boot loader.
  #boot.loader.systemd-boot.enable = true;
  #boot.loader.efi.canTouchEfiVariables = true;

  # Whether installer can modify the EFI variables.
  # If you encounter errors, set this to `false`.
  boot.loader.efi.canTouchEfiVariables = true;

  boot.loader.grub.enable = true;
  boot.loader.grub.efiSupport = true;
  boot.loader.grub.device = "nodev";

  # This should be done automatically, but explicitly declare it just in case.
  boot.loader.grub.copyKernels = true;
  # Make sure that you've listed all of the boot partitions here.
  boot.loader.grub.mirroredBoots = [
    { path = "/boot"; devices = ["/dev/disk/by-uuid/1F23-447B"]; }
    { path = "/boot-fallback"; devices = ["/dev/disk/by-uuid/460E-0D39"]; }
  ];

  fileSystems."/boot".options = [ "nofail" ];
  fileSystems."/boot-fallback".options = [ "nofail" ];

  boot.supportedFilesystems = [ "zfs" ];
  networking.hostId = "4532eafd";

  networking.hostName = "myhost"; # Define your hostname.
  # Pick only one of the below networking options.
  # networking.wireless.enable = true;  # Enables wireless support via wpa_supplicant.
  networking.networkmanager.enable = true;  # Easiest to use and most distros use this by default.

  # Set your time zone.
  time.timeZone = "America/Toronto";

  # Configure network proxy if necessary
  # networking.proxy.default = "http://user:password@proxy:port/";
  # networking.proxy.noProxy = "127.0.0.1,localhost,internal.domain";

  # Select internationalisation properties.
  # i18n.defaultLocale = "en_US.UTF-8";
  # console = {
  #   font = "Lat2-Terminus16";
  #   keyMap = "us";
  #   useXkbConfig = true; # use xkb.options in tty.
  # };

  # Enable the X11 windowing system.
  #services.xserver.enable = true;


  # Enable the GNOME Desktop Environment.
  #services.xserver.displayManager.gdm.enable = true;
  #services.xserver.desktopManager.gnome.enable = true;
  

  # Configure keymap in X11
  # services.xserver.xkb.layout = "us";
  # services.xserver.xkb.options = "eurosign:e,caps:escape";

  # Enable CUPS to print documents.
  # services.printing.enable = true;

  # Enable sound.
  # services.pulseaudio.enable = true;
  # OR
  # services.pipewire = {
  #   enable = true;
  #   pulse.enable = true;
  # };

  # Enable touchpad support (enabled default in most desktopManager).
  # services.libinput.enable = true;

  # Define a user account. Don't forget to set a password with ‘passwd’.
  users.users.myuser = {
     isNormalUser = true;
     extraGroups = [ "wheel" ]; # Enable ‘sudo’ for the user.
     packages = with pkgs; [
       vim
     ];
   };

  # programs.firefox.enable = true;

  # List packages installed in system profile.
  # You can use https://search.nixos.org/ to find more packages (and options).
  # environment.systemPackages = with pkgs; [
  #   vim # Do not forget to add an editor to edit configuration.nix! The Nano editor is also installed by default.
  #   wget
  # ];

  # Some programs need SUID wrappers, can be configured further or are
  # started in user sessions.
  # programs.mtr.enable = true;
  # programs.gnupg.agent = {
  #   enable = true;
  #   enableSSHSupport = true;
  # };

  # List services that you want to enable:

  # Enable the OpenSSH daemon.
  services.openssh.enable = true;

  # Open ports in the firewall.
  # networking.firewall.allowedTCPPorts = [ ... ];
  # networking.firewall.allowedUDPPorts = [ ... ];
  # Or disable the firewall altogether.
  # networking.firewall.enable = false;

  # Copy the NixOS configuration file and link it from the resulting system
  # (/run/current-system/configuration.nix). This is useful in case you
  # accidentally delete configuration.nix.
  # system.copySystemConfiguration = true;
  # This option defines the first version of NixOS you have installed on this particular machine,
  # and is used to maintain compatibility with application data (e.g. databases) created on older NixOS versions.
  #
  # Most users should NEVER change this value after the initial install, for any reason,
  # even if you've upgraded your system to a new NixOS release.
  #
  # This value does NOT affect the Nixpkgs version your packages and OS are pulled from,
  # so changing it will NOT upgrade your system - see https://nixos.org/manual/nixos/stable/#sec-upgrading for how
  # to actually do that.
  #
  # This value being lower than the current NixOS release does NOT mean your system is
  # out of date, out of support, or vulnerable.
  #
  # Do NOT change this value unless you have manually inspected all the changes it would make to your configuration,
  # and migrated your data accordingly.
  #
  # For more information, see `man configuration.nix` or https://nixos.org/manual/nixos/stable/options#opt-system.stateVersion .
  system.stateVersion = "25.05"; # Did you read the comment?

}

Then I just need to

nixos-install
reboot

and my system will come up nicely into a grub managed boot with a mirrored ZFS system for my root filesystem.

Wiping one of the two drives still allows me to boot. If I wipe the ‘primary’ device, the UTM system dumps me into the EFI shell, but you can easily navigate to indicate to use the secondary device and up comes my broken mirror ZFS root system. Sweet - this is working nicely.

Running a subset of the ‘install’ commands will re-build the missing drive - again, you need to modify the script to specify the right disk / partitions.

DISK1=/dev/disk/by-id/nvme-QEMU_NVMe_Ctrl_2706D006-65DD-4544-8

partition() {
    sgdisk --zap-all "$1"
    sgdisk -n 1:0:+1GiB -t 1:EF00 -c 1:boot "$1"
    # Swap is omitted.
    sgdisk -n 2:0:0 -t 2:BF01 -c 2:zfs "$1"
    sgdisk --print "$1"
}

partition $DISK1

mkfs.vfat $DISK1-part1

mount $DISK1-part1 /boot-fallback

Again, this works. All I need to do is run a zpool replace to swap in the new mirror.

$ sudo zpool replace rpool 13814653030036822195 /dev/disk/by-id/nvme-QEMU_NVMe_Ctrl_2706D006-65DD-4544-8-part2

Boom… my ZFS mirror syncs and I’m good… and my previous mounting of the missing /boot-fallback allows me to do a nixos-rebuild switch and update the grub on both drives.

However, upon reboot - I’m still missing /boot-fallback - there must be something simple I’m missing.

I’ve also tried nixos-rebuild boot --install-bootloader followed by a reboot – still the same problem.

Oh, and yes - I’m updating the /etc/nixos/configuration.nix to reflect the new disk/by-uuid of the new mirror volume – that’s not seeming to help fix this issue.

It appears that nixos-install is doing something that I’m not doing when I’m trying to replace the mirror that has failed.

What happens if you re-run nixos-install?

(nixos-install is meant to be idempotent i.e. if it works once then you can run it as many times as you like.)

Also, just checking but this seems to have nothing to do with ZFS, right? Your problem is only with a partition that isn’t and never was using ZFS.

Oh cool, I didn’t know I could non-destructively run the install (but thinking about the Nix way… of course it works like that)

Also - yes, you are correct

Also, just checking but this seems to have nothing to do with ZFS, right? Your problem is only with a partition that isn’t and never was using ZFS.

This is about making my system boot normally again AFTER I’ve managed to get ZFS mirrored up. I mention ZFS because if the answer is - stop using grub, boot with sysytemd – sure, I can do that too - but it seems grub handles mirrored boots and there is a pile of hacks for systemd.

Hmm… well - naively running nixos-install fails (unsurprisingly)

$ nixos-install 
mount point /mnt doesn't exist

Forcing the root gives me

$ sudo nixos-install --root /
building the configuration in //etc/nixos/configuration.nix...
/nix/store/7v3n1jf76ks2w60libawpclnppqvbf7q-nixos-system-gold-25.05.810859.20c4598c84a6
installing the boot loader...
setting up /etc...
updating GRUB 2 menu...
installing the GRUB 2 boot loader into /boot...
Installing for arm64-efi platform.
Installation finished. No error reported.
updating GRUB 2 menu...
installing the GRUB 2 boot loader into /boot-fallback...
Installing for arm64-efi platform.
Installation finished. No error reported.
updating GRUB 2 menu...
installing the GRUB 2 boot loader into /boot...
Installing for arm64-efi platform.
Installation finished. No error reported.
umount: /nix/store: target is busy.

I find it weird that grub is installed 3 times… but that seems to happen even on the initial nixos-install when I’m building the system from scratch. The unmount warning may be ignorable…

Unfortunately after a reboot – I still have the missing /boot-fallback

$ mount | grep boot
/dev/nvme0n1p1 on /boot type vfat (rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro)

Ok - digging a bit more - it seems that /etc/fstab does contain the filesystem mounts (again, expected if I had thought about this a bit more)

$ cat /etc/fstab 
# This is a generated file.  Do not edit!
#
# To make changes, edit the fileSystems and swapDevices NixOS options
# in your /etc/nixos/configuration.nix file.
#
# <file system> <mount point>   <type>  <options>       <dump>  <pass>

# Filesystems.
rpool/local/root / zfs x-initrd.mount 0 0
/dev/disk/by-uuid/FF83-74DA /boot vfat nofail,fmask=0022,dmask=0022 0 2
/dev/disk/by-uuid/FF84-DBFD /boot-fallback vfat nofail,fmask=0022,dmask=0022 0 2

but… this /etc/fstab doesn’t reflect what I’ve updated my /etc/nixos/configuration.nix to reflect

$ grep device /etc/nixos/configuration.nix 
  boot.loader.grub.device = "nodev";
    { path = "/boot"; devices = ["/dev/disk/by-uuid/FF83-74DA"]; }
    { path = "/boot-fallback"; devices = ["/dev/disk/by-uuid/D73D-0AFD"]; }

So… something is wrong here.

For reference …

$ ls -l /dev/disk/by-uuid/
total 0
lrwxrwxrwx 1 root root 15 Oct 14 14:19 11724166571828384807 -> ../../nvme0n1p2
lrwxrwxrwx 1 root root 15 Oct 14 14:19 D73D-0AFD -> ../../nvme1n1p1
lrwxrwxrwx 1 root root 15 Oct 14 14:19 FF83-74DA -> ../../nvme0n1p1

Why doesn’t nixos-rebuild fix this file (/etc/fstab) for me?

Oh wait… look at that…

$ cat /etc/nixos/hardware-configuration.nix 
# Do not modify this file!  It was generated by ‘nixos-generate-config’
# and may be overwritten by future invocations.  Please make changes
# to /etc/nixos/configuration.nix instead.
{ config, lib, pkgs, modulesPath, ... }:

{
  imports =
    [ (modulesPath + "/profiles/qemu-guest.nix")
    ];

  boot.initrd.availableKernelModules = [ "xhci_pci" "nvme" "usbhid" "usb_storage" "sr_mod" ];
  boot.initrd.kernelModules = [ ];
  boot.kernelModules = [ ];
  boot.extraModulePackages = [ ];

  fileSystems."/" =
    { device = "rpool/local/root";
      fsType = "zfs";
    };

  fileSystems."/boot" =
    { device = "/dev/disk/by-uuid/FF83-74DA";
      fsType = "vfat";
      options = [ "fmask=0022" "dmask=0022" ];
    };

  fileSystems."/boot-fallback" =
    { device = "/dev/disk/by-uuid/FF84-DBFD";
      fsType = "vfat";
      options = [ "fmask=0022" "dmask=0022" ];
    };

  swapDevices = [ ];

  # Enables DHCP on each ethernet and wireless interface. In case of scripted networking
  # (the default) this is the recommended approach. When using systemd-networkd it's
  # still possible to use this option, but it's recommended to use it in conjunction
  # with explicit per-interface declarations with `networking.interfaces.<interface>.useDHCP`.
  networking.useDHCP = lib.mkDefault true;
  # networking.interfaces.enp0s1.useDHCP = lib.mkDefault true;

  nixpkgs.hostPlatform = lib.mkDefault "aarch64-linux";
}

It seems the boot filesystem is defined in the hardware configuration… should I re-generate that? (in a non-destructive way…)

I can confirm that hacking the /etc/nixos/hardware-configuration.nix file to reflect the correct /boot-fallback device… fixes me

So what is the right way to regenerate just my hardware configuration file?

Ahh… and man nixos-generate-config answers the question

DESCRIPTION
This command writes two NixOS configuration modules:

   /etc/nixos/hardware-configuration.nix
           This  module  sets  NixOS configuration options based on your current hardware configuration. In particular, it sets the fileSystem option to reflect all
           currently mounted file systems, the swapDevices option to reflect active swap devices, and the boot.initrd.* options to ensure that the  initial  ramdisk
           contains any kernel modules necessary for mounting the root file system.

           If   this   file   already   exists,  it  is  overwritten.  Thus,  you  should  not  modify  it  manually.  Rather,  you  should  include  it  from  your
           /etc/nixos/configuration.nix, and re-run nixos-generate-config to update it whenever your hardware configuration changes.

   /etc/nixos/configuration.nix
           This is the main NixOS system configuration module. If it already exists, it’s left unchanged. Otherwise, nixos-generate-config will write a template for
           you to customise.

To summarize things

I was having trouble experimenting with a NixOS system with a mirrored boot drive. While I could recover the (ZFS) mirror, I wasn’t able to get the system back into a state where both of the EFI partitions were mounted on reboot (/boot and /boot-fallback)

It turns out that while the /etc/nixos/configuration.nix file defines the mirrored boot setup for grub, such that all grub changes are written to two places - the /etc/nixos/hardware-configuration.nix file defines the partitions that are mounted on boot.

I was able to solve the problem by manually changing this file and doing a nixos-rebuild – however, you can simply re-run nixos-generate-config which will update the hardware definition file, but NOT change the /etc/nixos/configuration.nix file because one exists.

This thread might be useful for someone who is trying to setup an unencrypted ZFS mirror for their root drive AND wants to know what steps are required to recover fully if a drive would fail.

2 Likes

Thanks for documenting all that!

I wonder whether there should be a command called nixos-idempotently-reinstall that’s just a link to nixos-install (joking not joking) :slight_smile:

Np - that’s sort of how I roll – I want to make my posts useful for others who might be stumbling down the same path.

1 Like