How to: Zfs root install on raidz1

Hi,

I’ve read most documentation I can find, searched and read through most any related discourse forum searches. I cannot find anywhere is anyone successfully install nixos root onto raidz1 and can boot nixos from raidz1? ie three or four disk raidz1 zpool.
What is the required configuration for zroot on raidz? Related questions: What occurs and does system successfully boot with a missing or failed disk from the raidz vdev? Anyone running this successfully and if you experienced disk failures and replaced disks from raidz vdev what if any special operations do you have to perform after zfs replacing disk to make the new replacement also bootable?

docs read

https://openzfs.github.io/openzfs-docs/Getting%20Started/NixOS/Root%20on%20ZFS.html
https://nixos.wiki/wiki/ZFS

read disko examples which do not mention raidz1

https://github.com/nix-community/disko/blob/d64e5cdca35b5fad7c504f615357a7afe6d9c49e/example/zfs-encrypted-root.nix
https://github.com/nix-community/disko/blob/d64e5cdca35b5fad7c504f615357a7afe6d9c49e/example/zfs-over-legacy.nix
https://github.com/nix-community/disko/blob/d64e5cdca35b5fad7c504f615357a7afe6d9c49e/example/zfs-with-vdevs.nix
https://github.com/nix-community/disko/blob/d64e5cdca35b5fad7c504f615357a7afe6d9c49e/example/zfs.nix

found a issue from 2023 that was unresolved or user never followed up on problem raidz config

https://github.com/nix-community/disko/issues/354

It’s not really any different than installing on a single disk pool. The only difference is the zpool create command where you choose what disks to use in what layout. Other than that, you only ever need to refer to the pool name, so there’s no difference.

During boot, NixOS will wait until a pool can be imported healthy, but if a timeout is reached it will attempt to import the pool degraded. So if a disk goes missing or dies, NixOS will still boot.

The one odd consideration is your /boot file system. Some guides will tell you to use ZFS for this too; don’t. You don’t want to force yourself to use grub, and grub’s ZFS support is bad anyway. If a disk goes missing, grub does not know how to recover from parity, it will just fail to boot. IMO it’s best to simply stick to a regular partition for /boot. And if a disk dies, it’s easy enough to boot the live ISO and rebuild with a different disk’s ESP as your /boot.

2 Likes

Hi,


>> The one odd consideration is your /boot file system. Some guides will tell you to use ZFS for this too; don’t. You don’t want to force yourself to use grub, and grub’s ZFS support is bad anyway. If a disk goes missing, grub does not know how to recover from parity, it will just fail to boot. IMO it’s best to simply stick to a regular partition for /boot.
```

Can you explain more specifically “it’s best to simply stick to a regular partition for /boot" ?

For example, I have four disks I want to install nixos onto the raidz


sudo zpool create tank raidz /dev/nvme0n1 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1


How and where do I create the /boot regular partition?

Have you actually configured and booted nixos from raidz? Can you please share any config details? Any config and any info that anyone can share would be great.

Thank you

First of all, I just want to make sure you know that you need to refer to the nvme namespace, not just the controller. i.e. You’d use /dev/nvme0n1, not /dev/nvme0 to refer to the drive.

Anyway. I do have a machine that boots of raidz. It doesn’t require any special configuration at all. Like I said, the only difference is creating the pool with more drives than one. From then, the configuration just refers to the pool by name like normal, and doesn’t have to take any considerations whatsoever for the multiple devices.

In order to do this, I would personally partition each drive with a 1G ESP (EFI System Partition), and choose one of those to act as your /boot file system. The rest of each drive can be one large partition, and those partitions can be used to make up the pool. Before you ask, no, ZFS does not need direct access to the disks without partitioning. That’s an old myth. All ZFS needs is block devices that behave like block devices, and partitions qualify just fine.

If the drive containing /boot ever dies or disappears, you can boot the live ISO and reconfigure the system to use a different one of those ESPs as /boot. If you use grub, there is boot.loader.grub.mirroredBoots, which you can use to have all those ESPs populated and bootable so that you don’t need to manually switch to another if a drive disappears. There are some posts here on discourse about how to accomplish this part. But I personally prefer to avoid grub more than I want redundant boot partitions. Plus, it’s a really unusual failure mode for a drive to just disappear on boot, allowing the UEFI to use another drive to boot. Much more often, the drive is failing partially, and UEFI will try to boot it whether or not it would fail. So the redundancy of mirrored boots is pretty unlikely to actually help any real world scenario short of just physically pulling a disk.

TL;DR: The main point I’m trying to get across is: NixOS’s ZFS implementation does not need you to do anything special for the root FS to be on raidz vs a single disk with ZFS. That will all just work automatically based on the pool name. The only trouble is what your /boot file system is going to be, and I think you should just have a partition available in each disk to serve the role and pick any one of them.

1 Like

As far as I know, you also need a regular swap partition, as problems can still arise with ZFS if a swap file is used on it!

Oh, yes. You can’t create a swap file on ZFS at all, and using zvol block devices as swap is pretty likely to lock your machine up under memory pressure. Personally, I just don’t run with swap on disk, though I do run zramSwap.

I very much appreciate your fast and detailed response. I am quickly reading your message and wish to understand you clearly. The nvme dev device names previously listed were just simplifications I conjured up quickly writing previously to you, yes I recognize the zpool must consist of the partition or namespace of device.

I want to test and be prepared for any failure modes, and I do understand and agree that more frequently drives start malfunctioning rather than disappear on boot. I will try to follow your instructions and partition all drives with 1G ESP EFI System Partition, then for testing just choose one drive partition ie /dev/sdb0 to act as /boot

Later I will try on setting up another machine with grub and try boot.loader.grub.mirroredBoots I will try to find posts about it on discourse.

Curious: are there any other reasons you “personally prefer to avoid grub more than [you] want redundant boot partitions.”?

Have you ever tried using disko to setup your raidz boot pool on machines? When I have time I want to try to standup a few identical machines with same specs and disks to deploy raidz boot.

@DocBrown101

Ok I will review the documentation and disko try to test how to setup separate swap not on zfs. I don’t know how this should be accomplished in a non-mirror, on a raidz vdev group of disks. I only saw swap for zfs on root mentioned at https://openzfs.github.io/openzfs-docs/Getting%20Started/NixOS/Root%20on%20ZFS.html

@ElvishJerricco

I will consider zramSwap as alternative if I decide to enable swap. The systems will have hundreds of gigs of ram and I will consider running with no swap, or just run with zramSwap. I may not want to relearn to fine tune zramSwap, from recollection in past usage on small machines I believe zramSwap is highly configurable and uses cpu and ram bandwidth to perform its compression and rebalancing work if/when ram contention arises.

Thank you

Grub is known to be a very complex boot loader with a large surface area for bugs, and is relatively under-maintained upstream as well. NixOS currently includes 73 patches for grub for security issue alone (albeit, mostly ones that are very unlikely to be exploitable). In my experience as the ISO maintainer, grub has been one of the two biggest causes for boot failure (the other being people not realizing that Ventoy is unreliable). I’m also just, generally speaking, a fan of UEFI as a pretty modular and well-standardized boot interface, so boot loaders like systemd-boot that take advantage of and embrace that are my preference.

I have not used disko on any of my machines at all. :stuck_out_tongue: I think disko can make sense if you’re deploying a fleet of systems or deploying new systems automatically with regularity. But my systems are just individual systems that each have their own designs and purpose so I haven’t seen a reason to bother.

1 Like

Great information and thank you for pointing me in a good direction. When I have time I will work on testing these scenarios and hopefully standup a couple systems.

Super project and amazing work.

Thank you