Mdadm + LUKS at boot is not working

I’m trying to boot an mdadm-managed RAID5 with LUKS encryption on top for my /home partition.

The raid seems to be working, it works whenever I boot into the nixos installation image. (This is a new system installation.) On regular boot, it says that the disk (the mdadm disk) is not showing up. I think it is the systemd-udev-settle giving it 10 seconds and then aborting boot as the disk doesn’t show up.

I have boot.swraid.enable = true; (I tried adding the manual .mdadmConf as well but without effect.)

Then I have the boot.initrd.luks.devices."homeraid".device set to the by-uuid of the mdadm RAID5 disk. The UUID works and shows up in nixos installation in normal boot when I comment out these bits.

Then I have the regular fileSystems."/home".device and (fsType) set to the by-uuid for the LUKS mapper homeraid. Most of this was done by nixos-generate-config except for the swraid.enable = true.

ChatGPT suggested a whole bunch of kernel modules to add, but that didn’t help either.

It then suggested adding extra timing to give the disks more time (that didn’t work, though, it’s still 10 seconds only), and it added some timing dependencies between the systemd services using the .after. (Between cryptsetup@homeraid and mdadm-grow-continue.)

It seems that no one on the internet has ever documented the situation of adding LUKS on top of mdadm RAID and trying to use that at boot time. When I get this to work, I might publish it somewhere.

One of the last things I am considering trying now is to switch to GRUB instead of systemd-boot. It might just have better support for RAID and LUKS.

Ok another hour and trying everything I could think of made it work.

So somehow it seems that between the time mdadm organizes the raid and LUKS trying to open the volume, the /dev/disk/by-uuid link is not put in place yet. The UUID I used is correct, and the /dev/disk/by-uuid is there correctly at boot. But right in the moment when LUKS tries to access mdadm’s volume, it is not working.

Replacing the boot.initrd.luks.devices."homeraid".device do just /dev/md/homeraid instead of /dev/disk/by-uuid/<uuid> worked.

Hell.

I’m curious what would have happened if you had moved the LUKS config to stage 2. The suggestion to put some dependencies in the systemd units with .after wouldn’t do anything because the LUKS configuration generated by nixos-generate-config is set up in stage 1, which isn’t necessary most of the time (EDIT: for non-root file systems, that is). You could have instead done

environment.etc.crypttab.text = ''
  homeraid UUID=...
'';

And it would have been decrypted by systemd in stage 2 instead. Or, you can enable boot.initrd.systemd.enable and then the regular options also use systemd but in stage 1.