21.05 ZFS root install can't import pool on boot

Has anyone been able to install 21.05 on a ZFS root? My attempts have all failed. The message importing root ZFS pool "rpool"... appears and a long string of dots follows. Then It tries about 10 more times with mounting rpool/root/nixos on /mnt-root failed: no such file or directory and drops me into a menu with reboot immediately or ignore the error and continue. Neither rebooting nor continuing works.

I added boot_on_fail to the boot.kernelParams, which adds launch an interactive shell and start an interactive shell having pid1 options. I chose the first option to enter a shell. I am able to import the pool with:

# zpool import -f rpool
# mount -t zfs rpool/root/nixos /mnt-root
# exit

and it finished booting successfully. But the same problem occurs again when I reboot. Clearly something is preventing the root pool from being imported.

Any suggestions?

NB: I have been running ZFS root since at least 19.09 so I’m not a stranger to the process but this is the first time I’ve any problems. It has just worked in the past.

Details

I created the VM using the command:

$ virt-install --connect qemu:///system --virt-type kvm \
    --name minimal-zfs --description "NixOS 21.05 minimal ZFS install" \
    --os-type=linux --os-variant=none --machine q35 --sound ich9 \
    --vcpus=1 --memory=2048 --rng /dev/random --boot uefi \
    --graphics spice --video virtio --channel spicevmc \
    --network network=default,model=virtio,mac=RANDOM \
    --controller type=scsi,model=virtio-scsi,driver.iommu=on \
    --boot hd,cdrom \
    --disk device=disk,bus=virtio,format=qcow2,driver.discard=unmap,boot.order=1,path=/virt/images/nixos/minimal_zfs.qcow2,size=20 \
    --disk device=cdrom,bus=scsi,readonly=on,boot.order=2,path=/virt/images/nixos/nixos-minimal-21.05.740.aa576357673-x86_64-linux.iso

I partitioned the virtual disk pretty much as shown in NixOS 23.11 manual | Nix & NixOS and https://nixos.wiki/wiki/NixOS_on_ZFS:

# parted -a optimal /dev/vda -- mklabel gpt
# parted -a optimal /dev/vda -- mkpart primary 512MiB -2GiB
# parted -a optimal /dev/vda -- mkpart primary linux-swap -2GiB 100%
# parted -a optimal /dev/vda -- mkpart ESP fat32 1MiB 512MiB
# parted -a optimal /dev/vda -- set 3 esp on

I created the pool with:

# zpool create \
    -m none -o altroot=/mnt \
    -o ashift=12 -o autotrim=on \
    -O atime=off -O relatime=on \
    -O compression=lz4 \
    -O acltype=posixacl -O xattr=sa \
    -O normalization=formD \
    rpool /dev/vda1

created the datasets and mounted everything with:

# zfs create -p -o mountpoint=legacy rpool/root/nixos
# zpool set bootfs=rpool/root/nixos rpool
# mount -t zfs rpool/root/nixos /mnt

# mkfs.vfat -F 32 -n EFI /dev/vda3
# mkdir /mnt/boot
# mount /dev/vda3 /mnt/boot
# mkswap -L swap /dev/vda2
# swapon /dev/vda2

# zfs create -p -o mountpoint=legacy rpool/home
# mkdir /mnt/home
# mount -t zfs rpool/home /mnt/home

Next I generated the initial configuration with:

# nixos-generate-config --root /mnt

and modified configuration.nix to be:

{ config, pkgs, ... }:
{
  imports = [
    ./hardware-configuration.nix
  ];

  boot.loader.systemd-boot.enable = true;
  boot.loader.efi.canTouchEfiVariables = true;
  boot.initrd.supportedFilesystems = ["zfs"];
  boot.supportedFilesystems = [ "zfs" ];
  boot.kernelParams = [
    "zfs.zfs_arc_max=1610612736"  # 1.5GB
    "boot.shell_on_fail"  # insecure: no password required; use only for debugging
  ];

  networking.hostId = "be798c19";
  networking.useDHCP = false;  # deprecated
  networking.interfaces.enp1s0.useDHCP = true;

  system.stateVersion = "21.05"; # man configuration.nix or https://nixos.org/nixos/options.html
}

then installed and rebooted:

nixos-install --root /mnt
shutdown -r now

I usually also ad an entry like:

  fileSystems."/" =
    { device = "rpool/root/nixos";
      fsType = "zfs";
      neededForBoot = true;
    };

So that nix is aware that this exists.

I would also check in hardware-configuration.nix to make sure there’s nothing conflicting.

Thanks Jon. I didn’t know I needed to to that (at least I haven’t had to do it previously).

I added that and checked that it didn’t conflict with hardware-configuration.nix. (It doesn’t. It is exactly the same except it adds neededForBoot = true.)

I reran nixos-rebuild switch but it isn’t creating a new generation and the same problem is occurring upon boot.

Additional thoughts?

Most of your configuration looks good. It’s probably one very small detail which is misaligned. sorry :frowning:

Sounds like it imported the pool just fine and /mnt-root doesn’t exist for some reason. If the pool hadn’t been imported, you would see filesystem 'rpool/root/nixos' cannot be mounted, unable to open the dataset instead. This is quite odd.

That is a great hint @ElvishJerricco. Thanks.

I peeked inside the init script in the initrd. /mnt-root is created on line 558 while the pool is supposed to be imported around line 293. I compared that to the initrd from a working 20.09 system (lines 554 and 279, respectively).

The diff between the two init files are:

  • Different hashes in the paths to the nix store, as expected
  • Messages use info rather than echo to display; should be safe
  • hostids are different, of course
  • Symlinks for /dev/{stdin,stdout,stdeerr} are now created where they weren’t before; probably doesn’t cause a problem
  • System time is now set to work around a bug in qemu-kvm, probably doesn’t cause a problem
  • There is a resume partition UUID for 21.05 whereas there was none for the 20.09; seems like this would not break things
  • Revamped code for starting stage 2; it looks reasonable but…

Since the order in which the pool is imported and /mnt-root are the same, as is the code for importing the pools, the problem must reside in either the changed stage 2 logic or in one of the files that init references.

This is the 20.09 stage 2 logic

# Start stage 2.  `switch_root' deletes all files in the ramfs on the
# current root.  Note that $stage2Init might be an absolute symlink,
# in which case "-e" won't work because we're not in the chroot yet.
if [ ! -e "$targetRoot/$stage2Init" ] && [ ! -L "$targetRoot/$stage2Init" ] ; then
    echo "stage 2 init script ($targetRoot/$stage2Init) not found"
    fail
fi

and this is the 21.05 stage 2 logic:

# Start stage 2.  `switch_root' deletes all files in the ramfs on the
# current root.  The path has to be valid in the chroot not outside.
if [ ! -e "$targetRoot/$stage2Init" ]; then
    stage2Check=${stage2Init}
    while [ "$stage2Check" != "${stage2Check%/*}" ] && [ ! -L "$targetRoot/$stage2Check" ]; do
        stage2Check=${stage2Check%/*}
    done
    if [ ! -L "$targetRoot/$stage2Check" ]; then
        echo "stage 2 init script ($targetRoot/$stage2Init) not found"
        fail
    fi
fi

I don’t see anything obviously wrong.

Note: I verified that the expected values for mountPoint (/), device (rpool/root/nixos), fsType (zfs), and options (defaults), read from initrd-fsinfo, are the same for both working and non-working configurations. So finding and mounting the root device should work the same.

I checked that /mnt-root already exists after the root pool fails to import and an emergency shell is brought up. It exists by the time the shell starts. The commands zpool import rpool, mount -t zfs rpool/root/nixos /mnt-root, and exit are sufficient for stage 2 to successfully load.

Perhaps there is a race condition between the ZFS device and the init code that isn’t being worked around by the “importing root ZFS pool” 60s loop or the subsequent “retry” loop in mountFS. I can’t see why it would break with 21.05 when it worked with 20.09.

I am at a loss what to check next.

Try setting boot.zfs.devNodes to the type of device you used to create your zpool(You can see this in zpool status)

For example, I use:

boot.zfs.devNodes = "/dev/disk/by-partuuid";

Maybe the difference between this and your previous working installs is how you created the zpool.

2 Likes

Thanks @dalto. That fixed the problem.

Details: normally I create a pool using /dev/disk/by-id/ but I used /dev/vdaX for this quick experiment using KVM. Further, the /dev/disk/by-id/ entries for vda and partitions don’t exist so my previous practice would likely have broken also. In the future, I will make sure to use boot.zfs.devNodes.

2 Likes

Post mortem:

Comparing the init scripts from the non-working and working configurations.nix:

$ diff -u kcslbdipnvscycj5kmz7c7wpmypqb947-initrd-linux-5.10.37-initrd/nix/store/75hpmxbddhybn9xzlggfqpir0bspxczx-stage-1-init.sh nnavwyfdi3ys42c1lnxwdxvwa68wl2p7-initrd-linux-5.10.37-initrd/nix/store/67sa12vrayf2r9mgp6phj8c37kgc6vcz-stage-1-init.sh
--- kcslbdipnvscycj5kmz7c7wpmypqb947-initrd-linux-5.10.37-initrd/nix/store/75hpmxbddhybn9xzlggfqpir0bspxczx-stage-1-init.sh     2021-06-17 17:58:16.854130924 -0400
+++ nnavwyfdi3ys42c1lnxwdxvwa68wl2p7-initrd-linux-5.10.37-initrd/nix/store/67sa12vrayf2r9mgp6phj8c37kgc6vcz-stage-1-init.sh     2021-06-17 17:58:38.957757581 -0400
@@ -287,7 +287,7 @@
 }
 poolImport() {
   pool="$1"
-  "zpool" import -d "/dev/disk/by-id" -N $ZFS_FORCE "$pool"
+  "zpool" import -d "/dev/disk/by-partuuid" -N $ZFS_FORCE "$pool"
 }

 echo -n "importing root ZFS pool \"rpool\"..."

it is obvious that using /dev/disk/by-id, which worked in 20.09, isn’t working anymore with 21.05.

2 Likes

That is extremely strange and probably worth a bug report on github.

Not sure where the problem originates but the disks are visible by partition UUID:

# ls -l /dev/disk/by-partuuid
total 0
0 lrwxrwxrwx 1 root root 10 Jun 17 20:53 0f27783e-f2df-4c19-8ec6-467fffd36a85 -> ../../vda3
0 lrwxrwxrwx 1 root root 10 Jun 17 20:53 39f712b8-ab81-4d61-9a73-7cf280ff6252 -> ../../vda1
0 lrwxrwxrwx 1 root root 10 Jun 17 20:53 56b14314-0dd4-4579-8758-836fcc80ce3e -> ../../vda2
0 lrwxrwxrwx 1 root root 10 Jun 17 20:53 d21dbb3d-4918-4b22-9740-dbfff7810a03 -> ../../vdb3
0 lrwxrwxrwx 1 root root 10 Jun 17 20:53 f1807f0f-aa63-415e-bef2-5b9772b1f9ef -> ../../vdb2
0 lrwxrwxrwx 1 root root 10 Jun 17 20:53 f8f539ae-00b6-4f5d-a221-587f9a380052 -> ../../vdb1

but are not visible when viewed by ID or completely visible when viewed by UUID:

ls -l /dev/disk/by-id
total 0
0 lrwxrwxrwx 1 root root 9 Jun 17 20:53 scsi-0QEMU_QEMU_CD-ROM_drive-scsi0-0-0-0 -> ../../sr0
# ls -l /dev/disk/by-uuid
total 0
0 lrwxrwxrwx 1 root root 10 Jun 19 13:21 11618213968631538825 -> ../../vda2
0 lrwxrwxrwx 1 root root  9 Jun 17 20:53 1980-01-01-00-00-00-00 -> ../../sr0
0 lrwxrwxrwx 1 root root 10 Jun 19 13:21 6452-65B9 -> ../../vda1
0 lrwxrwxrwx 1 root root 10 Jun 17 20:53 6D44-D00D -> ../../vdb1

It is fairly normal for not all partitions to get a UUID depending on what filesystems are in play. That is why I always use PARTUUID for zfs.

As for your by-id problem. Are you sure that isn’t a virtualization issue?

The fact that those devices don’t have entries there would explain why it doesn’t work when you use -d "/dev/disk/by-id"

The root cause of the problem was operator error. I should have specified a serial number for the disk when running virt-install. I assumed that if I didn’t specify one, one would be generated automatically. But according to man virt-install:

serial
    Serial number of the emulated disk device. This is used in linux guests to set /dev/disk/by-id
    symlinks. An example serial number might be: WD-WMAP9A966149

which I now know means that if one is not provided no /dev/disk/by-id will be created. That is why by-id (which is the default for zfs import in the init script) didn’t work and @dalto’s suggestion of using by-partuuid fixed my problem. You learn something every day. I now see why @dalto uses by-partuuid.

1 Like

I have come across the same issue when installing nix with ZFS on my X1. The problem is that the filesystem list links to the dis using /dev/disks/by-uuid instead of /dev/disks/by-partuuid.

Going into the hardware-configuration.nix and changing it to by-partuuid and adding the below option:

  boot.zfs.devNodes="${DISK_PATH}";

Fixed the issue for me. Is there an issue tracking this?

Hi @lucasvo,

Can provide an example to clarify what you changed? For example, this line:

https://github.com/GlenHertz/pi4nas.nix/blob/41b04dd95023fe5e813e647f257dca4e5614a8da/pi4nas/hardware-configuration.nix#L33

to use the /dev/disks/by-partuuid to the dataset and then set:

https://github.com/GlenHertz/pi4nas.nix/blob/41b04dd95023fe5e813e647f257dca4e5614a8da/configuration.nix#L22

Thanks.

Hi,

This is a long post but hopefully it helps someone else.

I think I got a bit further. After creating a zfs pool the drive id’s disappear. Compare the values of blkid before and after:

blkid after fdisk:

blkid | grep sdb
/dev/sdb1: PARTUUID="a82a01eb-a38b-7842-9649-dd2762725ca7"
/dev/sdb2: PARTUUID="7b4a6cb8-2853-fa41-9d20-f4ac9acaeb5e"
/dev/sdb3: PARTUUID="50b7d816-5415-fd4c-acb2-7b96af52b648"

After zpool create:

DISK=/dev/sdb3
zpool create -O mountpoint=none -O atime=off -O compression=lz4 -O xattr=sa -O acltype=posixacl -o ashift=12 -R /mnt zpool $DISK
blkid | grep sdb
# nothing returned
zpool status
  pool: zpool
 state: ONLINE
config:

	NAME        STATE     READ WRITE CKSUM
	zpool       ONLINE       0     0     0
	  sdb3      ONLINE       0     0     0

errors: No known data errors

After creating a dataset:

zfs create -o mountpoint=legacy zpool/tank
blkid | grep sdb
/dev/sdb3: LABEL="zpool" UUID="13714934573686082273" UUID_SUB="6456259992175977062" BLOCK_SIZE="4096" TYPE="zfs_member"

Now it has a LABEL and UUID. Checking with /dev/disk:

ls /dev/disk/by-uuid/
2178-694E  44444444-4444-4444-8888-888888888888  b6bca834-ccda-492a-88fe-48a233295f09

There is no matching UUID. And by label:

ls /dev/disk/by-label
FIRMWARE  NIXOS_SD

There is no matching label. By ID:

ls /dev/disk/by-id/ | grep HGST
ata-HGST_HDN726040ALE614_K7GX5ZBL
ata-HGST_HDN726040ALE614_K7GX5ZBL-part1
ata-HGST_HDN726040ALE614_K7GX5ZBL-part2
ata-HGST_HDN726040ALE614_K7GX5ZBL-part3

ata-HGST_HDN726040ALE614_K7GX5ZBL-part3 should be the right partition.

Now if I mount the dataset:

mount -t zfs zpool/tank /mnt/tank
blkid | grep sdb
/dev/sdb3: LABEL="zpool" UUID="13714934573686082273" UUID_SUB="6456259992175977062" BLOCK_SIZE="4096" TYPE="zfs_member"

The /dev/disk devices haven’t changed after the mount.

So I’ll go back and create the pool by partuuid instead:

umount /mnt/tank
zpool destroy zpool
zpool create -O mountpoint=none -O atime=off -O compression=lz4 -O xattr=sa -O acltype=posixacl -o ashift=12 -R /mnt zpool $DISK
zpool status
  pool: zpool
 state: ONLINE
config:

	NAME                                       STATE     READ WRITE CKSUM
	zpool                                      ONLINE       0     0     0
	  ata-HGST_HDN726040ALE614_K7GX5ZBL-part3  ONLINE       0     0     0

errors: No known data errors
blkid | grep sdb
/dev/sdb3: LABEL="zpool" UUID="13374479555073036372" UUID_SUB="17172025984875349708" BLOCK_SIZE="4096" TYPE="zfs_member"

Now with using the PARTUUID the device is showing after zpool create (where it wasn’t before) but the PARTUUID isn’t showing up in blkid or /dev/disk/by-partuuid.

ls /dev/disk/by-partuuid/
0d1e08a3-01  2178694e-01  2178694e-02

So it seems impossible for the mounter to find it. Let’s continue anyways:

zfs create -o mountpoint=legacy zpool/tank
blkid | grep sdb
/dev/sdb3: LABEL="zpool" UUID="13374479555073036372" UUID_SUB="17172025984875349708" BLOCK_SIZE="4096" TYPE="zfs_member"

But /dev/disk/by-uuid doesn’t have the above. Now the label appears.

ls /dev/disk/by-label
FIRMWARE  NIXOS_SD  zpool

The partuuid is missing:

ls /dev/disk/by-partuuid/
0d1e08a3-01  2178694e-01  2178694e-02
ls /dev/disk/by-uuid
13714934573686082273  2178-694E  44444444-4444-4444-8888-888888888888  b6bca834-ccda-492a-88fe-48a233295f09

But id is there:

ls /dev/disk/by-id | grep HGST
ata-HGST_HDN726040ALE614_K7GX5ZBL
ata-HGST_HDN726040ALE614_K7GX5ZBL-part1
ata-HGST_HDN726040ALE614_K7GX5ZBL-part2
ata-HGST_HDN726040ALE614_K7GX5ZBL-part3

So lets go back and create the pool with id:

umount /mnt/tank
zpool destroy zpool
DISK=/dev/disk/by-id/ata-HGST_HDN726040ALE614_K7GX5ZBL-part3
zpool status
  pool: zpool
 state: ONLINE
config:

	NAME                                       STATE     READ WRITE CKSUM
	zpool                                      ONLINE       0     0     0
	  ata-HGST_HDN726040ALE614_K7GX5ZBL-part3  ONLINE       0     0     0

errors: No known data errors
blkid | grep sdb
/dev/sdb3: LABEL="zpool" UUID="15388353107909713984" UUID_SUB="14747893016884677337" BLOCK_SIZE="4096" TYPE="zfs_member"
ls /dev/disk/by-id |grep part3
ata-HGST_HDN726040ALE614_K7GX5ZBL-part3
zfs create -o mountpoint=legacy zpool/tank
mount -t zfs zpool/tank /mnt/tank

Add to configuration.nix

boot.zfs.devNodes = "/dev/disk/by-id";

Then re-run nixos-generate-config and it adds:

  fileSystems."/mnt/tank" =
    { device = "zpool/tank";
      fsType = "zfs";
    };

Then nixos-rebuild switch and reboot.

Then on reboot the mount fails. If I go into single user mode and do a blkid only the drive (/dev/sdb) shows up with /dev/sdb TYPE="zfs_member". In /dev/disk/by-id only the hard drive itself shows up (ata-HGST_HDN726040ALE614_K7GX5ZBL) and not the partitions (ata-HGST_HDN726040ALE614_K7GX5ZBL-part3). So it looks like it can’t be mounted.

I tried zpool import -d /dev/disk/by-id and then it imports ata-HGST_HDN726040ALE614_K7GX5ZBL (not the partition part-3) and then it is CORRUPTED.

So maybe ZFS over USB is just a "bad idea"™? Any ideas for what might lead to a solution?

Glen

I’m curious if anyone got further on this topic. I’m running into my own set of boot issues when trying to set this up on a nixos 22.05 qemu vps. Thanks.