What is working:
- I can run an install script and create a mirrored ZFS boot volume in a virtual environment
- I can then remove a volume, and re-create the ZFS mirrored boot volume
What I’m stuck on:
- I don’t seem to be able to get both
/bootand/boot-fallbackto be persistent after recovery
The long story
I’m in the process of building out a new server, and while I’ve only lost one boot device in the last 20+ years, hardware is so cheap I thought why not go for a mirrored boot volume?
My choice of ZFS is more about sticking with as few filesystems types as I can, and I do plan for my main drive array to be ZFS based (RAIDZ).
Thus, I’m trying to build out a NixOS installation that will boot from a mirrored ZFS root drive, I’m also not trying to swim upstream too hard so I’m find with a grub based boot system to take advantage of the boot.loader.grub.mirroredBoots support.
While I have the actual hardware, I’ve been focused on building out a virtual environment which let’s me more rapidly iterate. Specifics here are UTM on a Mac M1 Pro, running the arm based version of NixOS.
My install script (based on this article)
# set up my disks by id
DISK1=/dev/disk/by-id/nvme-QEMU_NVMe_Ctrl_F1905C9F-0356-43B3-B
DISK2=/dev/disk/by-id/nvme-QEMU_NVMe_Ctrl_23AB37ED-C078-4AEF-8
# ran this 'micro-script'
partition() {
sgdisk --zap-all "$1"
sgdisk -n 1:0:+1GiB -t 1:EF00 -c 1:boot "$1"
# Swap is omitted.
sgdisk -n 2:0:0 -t 2:BF01 -c 2:zfs "$1"
sgdisk --print "$1"
}
partition $DISK1
partition $DISK2
# formatted the EFI partitions
mkfs.vfat $DISK1-part1
mkfs.vfat $DISK2-part1
# created zfs pool - but skipped encrytion
zpool create \
-o ashift=12 \
-O mountpoint=none -O atime=off -O acltype=posixacl -O xattr=sa \
-O compression=lz4 rpool mirror \
$DISK1-part2 $DISK2-part2
# made root data set and an empty snapshot
zfs create -p -o mountpoint=legacy rpool/local/root
zfs snapshot rpool/local/root@blank
# mounted it
mount -t zfs rpool/local/root /mnt
# and mounted the EFI paritions
mkdir /mnt/boot
mkdir /mnt/boot-fallback
mount $DISK1-part1 /mnt/boot
mount $DISK2-part1 /mnt/boot-fallback
# At this point I started configuration of nix
Before running this install script, I edit the first few lines to define DISK1 and DISK2 based on the system I’m installing it into.
After running the script I will run nixos-generate-config --root /mnt to create the hardware configuration for this machine.
Then I copy in my configuration.nix file which is below
# Edit this configuration file to define what should be installed on
# your system. Help is available in the configuration.nix(5) man page, on
# https://search.nixos.org/options and in the NixOS manual (`nixos-help`).
{ config, lib, pkgs, ... }:
{
imports =
[ # Include the results of the hardware scan.
./hardware-configuration.nix
];
# Use the systemd-boot EFI boot loader.
#boot.loader.systemd-boot.enable = true;
#boot.loader.efi.canTouchEfiVariables = true;
# Whether installer can modify the EFI variables.
# If you encounter errors, set this to `false`.
boot.loader.efi.canTouchEfiVariables = true;
boot.loader.grub.enable = true;
boot.loader.grub.efiSupport = true;
boot.loader.grub.device = "nodev";
# This should be done automatically, but explicitly declare it just in case.
boot.loader.grub.copyKernels = true;
# Make sure that you've listed all of the boot partitions here.
boot.loader.grub.mirroredBoots = [
{ path = "/boot"; devices = ["/dev/disk/by-uuid/1F23-447B"]; }
{ path = "/boot-fallback"; devices = ["/dev/disk/by-uuid/460E-0D39"]; }
];
fileSystems."/boot".options = [ "nofail" ];
fileSystems."/boot-fallback".options = [ "nofail" ];
boot.supportedFilesystems = [ "zfs" ];
networking.hostId = "4532eafd";
networking.hostName = "myhost"; # Define your hostname.
# Pick only one of the below networking options.
# networking.wireless.enable = true; # Enables wireless support via wpa_supplicant.
networking.networkmanager.enable = true; # Easiest to use and most distros use this by default.
# Set your time zone.
time.timeZone = "America/Toronto";
# Configure network proxy if necessary
# networking.proxy.default = "http://user:password@proxy:port/";
# networking.proxy.noProxy = "127.0.0.1,localhost,internal.domain";
# Select internationalisation properties.
# i18n.defaultLocale = "en_US.UTF-8";
# console = {
# font = "Lat2-Terminus16";
# keyMap = "us";
# useXkbConfig = true; # use xkb.options in tty.
# };
# Enable the X11 windowing system.
#services.xserver.enable = true;
# Enable the GNOME Desktop Environment.
#services.xserver.displayManager.gdm.enable = true;
#services.xserver.desktopManager.gnome.enable = true;
# Configure keymap in X11
# services.xserver.xkb.layout = "us";
# services.xserver.xkb.options = "eurosign:e,caps:escape";
# Enable CUPS to print documents.
# services.printing.enable = true;
# Enable sound.
# services.pulseaudio.enable = true;
# OR
# services.pipewire = {
# enable = true;
# pulse.enable = true;
# };
# Enable touchpad support (enabled default in most desktopManager).
# services.libinput.enable = true;
# Define a user account. Don't forget to set a password with ‘passwd’.
users.users.myuser = {
isNormalUser = true;
extraGroups = [ "wheel" ]; # Enable ‘sudo’ for the user.
packages = with pkgs; [
vim
];
};
# programs.firefox.enable = true;
# List packages installed in system profile.
# You can use https://search.nixos.org/ to find more packages (and options).
# environment.systemPackages = with pkgs; [
# vim # Do not forget to add an editor to edit configuration.nix! The Nano editor is also installed by default.
# wget
# ];
# Some programs need SUID wrappers, can be configured further or are
# started in user sessions.
# programs.mtr.enable = true;
# programs.gnupg.agent = {
# enable = true;
# enableSSHSupport = true;
# };
# List services that you want to enable:
# Enable the OpenSSH daemon.
services.openssh.enable = true;
# Open ports in the firewall.
# networking.firewall.allowedTCPPorts = [ ... ];
# networking.firewall.allowedUDPPorts = [ ... ];
# Or disable the firewall altogether.
# networking.firewall.enable = false;
# Copy the NixOS configuration file and link it from the resulting system
# (/run/current-system/configuration.nix). This is useful in case you
# accidentally delete configuration.nix.
# system.copySystemConfiguration = true;
# This option defines the first version of NixOS you have installed on this particular machine,
# and is used to maintain compatibility with application data (e.g. databases) created on older NixOS versions.
#
# Most users should NEVER change this value after the initial install, for any reason,
# even if you've upgraded your system to a new NixOS release.
#
# This value does NOT affect the Nixpkgs version your packages and OS are pulled from,
# so changing it will NOT upgrade your system - see https://nixos.org/manual/nixos/stable/#sec-upgrading for how
# to actually do that.
#
# This value being lower than the current NixOS release does NOT mean your system is
# out of date, out of support, or vulnerable.
#
# Do NOT change this value unless you have manually inspected all the changes it would make to your configuration,
# and migrated your data accordingly.
#
# For more information, see `man configuration.nix` or https://nixos.org/manual/nixos/stable/options#opt-system.stateVersion .
system.stateVersion = "25.05"; # Did you read the comment?
}
Then I just need to
nixos-install
reboot
and my system will come up nicely into a grub managed boot with a mirrored ZFS system for my root filesystem.
Wiping one of the two drives still allows me to boot. If I wipe the ‘primary’ device, the UTM system dumps me into the EFI shell, but you can easily navigate to indicate to use the secondary device and up comes my broken mirror ZFS root system. Sweet - this is working nicely.
Running a subset of the ‘install’ commands will re-build the missing drive - again, you need to modify the script to specify the right disk / partitions.
DISK1=/dev/disk/by-id/nvme-QEMU_NVMe_Ctrl_2706D006-65DD-4544-8
partition() {
sgdisk --zap-all "$1"
sgdisk -n 1:0:+1GiB -t 1:EF00 -c 1:boot "$1"
# Swap is omitted.
sgdisk -n 2:0:0 -t 2:BF01 -c 2:zfs "$1"
sgdisk --print "$1"
}
partition $DISK1
mkfs.vfat $DISK1-part1
mount $DISK1-part1 /boot-fallback
Again, this works. All I need to do is run a zpool replace to swap in the new mirror.
$ sudo zpool replace rpool 13814653030036822195 /dev/disk/by-id/nvme-QEMU_NVMe_Ctrl_2706D006-65DD-4544-8-part2
Boom… my ZFS mirror syncs and I’m good… and my previous mounting of the missing /boot-fallback allows me to do a nixos-rebuild switch and update the grub on both drives.
However, upon reboot - I’m still missing /boot-fallback - there must be something simple I’m missing.
I’ve also tried nixos-rebuild boot --install-bootloader followed by a reboot – still the same problem.
Oh, and yes - I’m updating the /etc/nixos/configuration.nix to reflect the new disk/by-uuid of the new mirror volume – that’s not seeming to help fix this issue.
It appears that nixos-install is doing something that I’m not doing when I’m trying to replace the mirror that has failed.