What are my options for configuring swap partitions on a mirrored ZFS install?

learningnix · November 25, 2023, 4:48am

I found the mirroring options for /boot so I’m covered there. I haven’t found anything about swap partitions though.

uep · November 25, 2023, 6:02am

Options:

2 matching partitions, not mirrored. You will not have redundancy for swap data and may crash if a disk fails. This may or may not be a problem for you.
2 partitions, made into a mirror via mdadm (or LVM). These are set up imperatively (e.g., during install) and then just referred to like any other block device in your NixOS config.
a swap zvol device on the mirrored pool. There are various issues and warnings about doing this on Linux. Results are mixed; it can be fine for some circumstances and is a lot more flexible with respect to sizing.

learningnix · November 27, 2023, 1:20am

Hi @uep, thanks. I’ll check out mdam. I know about LVM but hadn’t considered it as an option. I’m going to skip the swap on zvol.

ElvishJerricco · November 27, 2023, 1:41am

Something to note about swap is that NixOS disables hibernation when ZFS is enabled. This is because, although it is rare, it is a known bug that if any ZFS pools are imported when the system hibernates, those pools have the potential to be corrupted beyond repair. Note that I’m not talking about when using swap on ZFS; that has its own bugs. No matter what you store the swap on, this is a risk with hibernation and ZFS. So NixOS disables this possibility (though it can be enabled if you want, because like I said, it’s rare).

learningnix · November 27, 2023, 3:59am

If that’s the case, should I simplify and just create the swap on zfs? Sleep should be fine right?

This is what I plan to configure:
1MiB to 1GiB /boot/efi
1GiB to 100% LUKS container<ZFS<datasets, zvols

Since zvol are block level storage, creating VM’s on it would be more performant?

ElvishJerricco · November 27, 2023, 4:20am

I would not recommend swap on a ZFS zvol. ZFS does have documentation on doing it, but it warns:

CAUTION: for now swap on zvol may lead to deadlock, in this case please send your logs here.

In the bit of testing I did quite a few years ago, I did not find it difficult to trigger this deadlock.

learningnix · November 27, 2023, 4:30am

What about swap on a dataset or I should just stick with swap partitions. I was going to configure swap partitions<luks<mdadam.

ElvishJerricco · November 27, 2023, 4:34am

I believe swapon literally won’t let you use swap files on ZFS datasets. Even if you could, I can virtually guarantee you it would have all the same problems as zvols.

It is recommended to keep swap completely off of ZFS.

learningnix · November 27, 2023, 4:59am

Thanks for your time. This definitely makes me pause and reconsider my choice for a filesystem on my planned mirror setup. I’ll look up BcacheFS and BTRFS (again).

firecat53 · November 27, 2023, 11:03am

I’ve been running a mirrored ZFS on root install with separate swap partitions. They are set with different priorities so shouldn’t have any trouble if one disk dies. This is the setup per the openzfs NixOS page.

It works well and is much simpler than trying to setup a mirrored partition with mdadm.

uep · November 27, 2023, 11:49am

Sorry, but no. When a swap volume goes away, the least you can expect is that any process with pages swapped out to it will die or hang. Priorities don’t matter, at least unless the swap volume that dies is unused because of being the lower priority.

If you want to have the system keep running without interruption with a failed disk, you need the swap to be mirrored.

However, you also need the disks to fail in certain predictable ways, and not (for example) start returning garbage data or hanging on access or causing other hardware errors. It’s hard to predict or control that, and so if instead your goal is simply that you don’t lose persistent data, and are ok with a hardware failure potentially killing processes or causing a reboot, then you can probably be comfortable running with independent swap volumes.

If you don’t want a process to page in potentially corrupted data from a marginal/failing disk, then you want the swap to be on a zvol, with checksum validation. The other mirror types only detect errors (and re-read the other side of the mirror) when the disk reports an IO error. You also want ECC memory and some other system design choices. As noted, swap on zvol has some other concerns, but it’s generally fine if your use of swap is mostly for paging out idle processes to let the memory be put to better use, and you avoid thrashing under particularly intense memory pressure.

firecat53 · November 27, 2023, 5:06pm

Ah, thank you for the education! I’m willing to tolerate the possibility of an unscheduled reboot or application crash as this is a home server.

The Ubuntu OpenZFS on root page has a snippet for how to create the mirrored swap partitions (will need to run nixos-generate-config instead of adding values to /etc/fstab):

* For an unencrypted mirror or raidz topology:

# Adjust the level (ZFS raidz = MD raid5, raidz2 = raid6) and
# raid-devices if necessary and specify the actual devices.
mdadm --create /dev/md0 --metadata=1.2 --level=mirror \
    --raid-devices=2 ${DISK1}-part2 ${DISK2}-part2
mkswap -f /dev/md0
echo /dev/disk/by-uuid/$(blkid -s UUID -o value /dev/md0) \
    none swap discard 0 0 >> /etc/fstab