Stage-1 Not finding/building mdadm raid for root partition

Heyo! This may be a long one :stuck_out_tongue:

The issue is that upon boot, the block device for the root partition cannot be found – and after 10 seconds, it times out, states an error, and asks me to reboot.

I’m somewhat new to nixos, been using it about a month on my laptop, but would like to install it on a desktop pc as well. I had some spare drives lying around, so for this machine want to run a mdadm raid5 between four drives for the primary partition. I’ve tried to give as much info as I can that may be relevant in figuring out what’s going on. At this point I’m running out of ideas.

Error during boot

When booting, it gives a Waiting 10 seconds for device /dev/disk/by-uuid/xxxxxxxx-..., I believe this is the boot.initrd.luks.devices."cryptroot".device = "/dev/disk/by-uuid/c6f71789-640b-4035-8093-bd28044c7207"; failing to find the raid drive.

Comparing against my laptop I think there’s a chance that line should be changed to reference the uuid of the raid, once assembled, as opposed to the uuid of the encrypted partition once luks opened. – But changing this to match gave the same results. There’s also a chance I’m misinterpreting the lsblk.

System setup

I’m using four drives, each with a 1Gb partition at the front, and the remainder being part of the raid5 root partition. One drive has it’s 1Gb leading partition set to be the system boot partition, with the 1Gb of the other drives in a raid1. [I’m ignoring that for now] Both raids are then luks encrypted, with btrfs on top of that. – Then some subvol stuff, but it’s not getting that far.

I’m aware that which nvme gets mapped to sda,sdb,sdc,sdd is unpredictable, I’m trying to reference the drives using either their uuid or /dev/disk/by-id/ path.

What I’ve tried / Related resources

My main guide for this has been this discourse post, from someone with almost exactly the same issue. I’ve tried to mirror their config, and have compared /etc/mdadm.conf 's with other non-nix computers.
Nixos discourse topic: I want to create a raid0 for /var, but I’m unable to figure how to load mdamd on boot

From what I can gather, boot.initrd.services.swraid.enable needs to be set to true, and boot.initrd.services.swraid.mdadmConf should be set with the results of mdadm --detail --scan. I’m also writing that to /etc/mdadm.conf, but that may not be necessary.
See:

I cant really access a terminal after the boot fails, which is the biggest thing causing me difficulties debugging. I suspect it’s a drive uuid mismatch, or mdadm.conf is being set to the wrong stuff from my config

My config / System info

Relevant part of the hardware config file:

let
  mdadmconfigfile = ''
  ARRAY /dev/md126 metadata=1.2 UUID=e4329e79:b9f4456f:ba33392e:48db946f # raid5
  ARRAY /dev/md127 metadata=1.2 UUID=0f7f0286:d3228196:553e1da4:9f310130 # raid1
  '';
in
{

  [...]

  # Establish raids
  #boot.initrd.services.swraid.enable = {
  boot.swraid = {
    enable = true;
    mdadmConf = mdadmconfigfile;
  };
  environment.etc = { "mdadm.conf".text = mdadmconfigfile; };


  # Luks devices
  boot.initrd.luks.devices."cryptroot".device = "/dev/disk/by-uuid/c6f71789-640b-4035-8093-bd28044c7207";
  #boot.initrd.luks.devices."cryptvault".device = "/dev/disk/by-uuid/1b13dd53-9f3a-4580-b960-09a2b472f860";

  # Filesystems
  fileSystems."/" = {
    device = "/dev/disk/by-uuid/a55a1d94-03ab-4080-be60-c2bae6fafa81";
    fsType = "btrfs";
    options = [
      "subvol=@"
      "compress=zstd"
      "noatime"
    ];
  };

My lsblk -f as shown to the installer:

NAME             FSTYPE            FSVER            LABEL                      UUID                                 FSAVAIL FSUSE% MOUNTPOINTS
loop0            squashfs          4.0                                                                                    0   100% /nix/.ro-store
sda                                                                                                                                
├─sda1           linux_raid_member 1.2              nixos:1                    0f7f0286-d322-8196-553e-1da49f310130                
│ └─md127        crypto_LUKS       2                                           1b13dd53-9f3a-4580-b960-09a2b472f860                
│   └─cryptvault btrfs                                                         8fd7aafd-419d-4216-9e9e-4dad90e32724                
└─sda2           linux_raid_member 1.2              nixos:0                    e4329e79-b9f4-456f-ba33-392e48db946f                
  └─md126        crypto_LUKS       2                                           c6f71789-640b-4035-8093-bd28044c7207                
    └─cryptroot  btrfs                                                         a55a1d94-03ab-4080-be60-c2bae6fafa81    1.2T     5% /mnt/persist
                                                                                                                                   /mnt/home
                                                                                                                                   /mnt/swap
                                                                                                                                   /mnt/nix
                                                                                                                                   /mnt
sdb                                                                                                                                
├─sdb1           linux_raid_member 1.2              nixos:1                    0f7f0286-d322-8196-553e-1da49f310130                
│ └─md127        crypto_LUKS       2                                           1b13dd53-9f3a-4580-b960-09a2b472f860                
│   └─cryptvault btrfs                                                         8fd7aafd-419d-4216-9e9e-4dad90e32724                
└─sdb2           linux_raid_member 1.2              nixos:0                    e4329e79-b9f4-456f-ba33-392e48db946f                
  └─md126        crypto_LUKS       2                                           c6f71789-640b-4035-8093-bd28044c7207                
    └─cryptroot  btrfs                                                         a55a1d94-03ab-4080-be60-c2bae6fafa81    1.2T     5% /mnt/persist
                                                                                                                                   /mnt/home
                                                                                                                                   /mnt/swap
                                                                                                                                   /mnt/nix
                                                                                                                                   /mnt
sdc                                                                                                                                
├─sdc1           linux_raid_member 1.2              nixos:1                    0f7f0286-d322-8196-553e-1da49f310130                
│ └─md127        crypto_LUKS       2                                           1b13dd53-9f3a-4580-b960-09a2b472f860                
│   └─cryptvault btrfs                                                         8fd7aafd-419d-4216-9e9e-4dad90e32724                
└─sdc2           linux_raid_member 1.2              nixos:0                    e4329e79-b9f4-456f-ba33-392e48db946f                
  └─md126        crypto_LUKS       2                                           c6f71789-640b-4035-8093-bd28044c7207                
    └─cryptroot  btrfs                                                         a55a1d94-03ab-4080-be60-c2bae6fafa81    1.2T     5% /mnt/persist
                                                                                                                                   /mnt/home
                                                                                                                                   /mnt/swap
                                                                                                                                   /mnt/nix
                                                                                                                                   /mnt
sdd                                                                                                                                
├─sdd1           vfat              FAT32                                       DF95-8003                                           
└─sdd2           linux_raid_member 1.2              nixos:0                    e4329e79-b9f4-456f-ba33-392e48db946f                
  └─md126        crypto_LUKS       2                                           c6f71789-640b-4035-8093-bd28044c7207                
    └─cryptroot  btrfs                                                         a55a1d94-03ab-4080-be60-c2bae6fafa81    1.2T     5% /mnt/persist
                                                                                                                                   /mnt/home
                                                                                                                                   /mnt/swap
                                                                                                                                   /mnt/nix
                                                                                                                                   /mnt
sde                                                                                                                                
├─sde1           exfat             1.0              Ventoy                     4E21-0000                                           
│ └─ventoy       iso9660           Joliet Extension nixos-minimal-24.05-x86_64 1980-01-01-00-00-00-00                     0   100% /iso
├─sde2           vfat              FAT16            VTOYEFI                    3F32-27F5                                           
└─sde3           ext4              1.0                                         f537b3bf-fbf2-4b95-863a-5bbfc740b0f0   40.8G    29% /fd

My full config files if they may help:
Github permalink to current config

Version wise, I’m on a fresh install of nixos 24.05

Thank you!

I have no problem with my two raid drives:
The MAILADDR is mandatory as it won’t work without it.
The name may be mandatory. I’m not sure.
Make sure your UUID is for the md array and not an individual drive.

boot.swraid.enable = true;
boot.swraid.mdadmConf = "
MAILADDR smitty
ARRAY /dev/md126 metadata=1.2 name=backup:128 UUID=b6fb8e13:f75467cf:0af913da:56af7f22
ARRAY /dev/md127 metadata=1.2 name=stuff:0 UUID=26b6803c:47fc3b11:0f8f83e2:10a71509
";

I gave it a couple goes today with your advice of setting a mailaddr, and names, or varying combinations of those. Sadly still no luck. Thank’s for getting back to me! I just wish I’d timed when I posted the question a bit better as I haven’t had much time at that machine.

Is there any way to make it fail to shell during an efi boot? I’d just want to recheck lsblk and logs. – This is such a weird bug for me because I can’t really get any info to diagnose it.

There’s some cmdline params that can help with debugging scripted stage 1: NixOS Manual

You could also try enabling systemd stage 1. I think it has mdraid working. boot.initrd.systemd.enable = true;. Though, the mechanisms for debugging this are different than in that link (I really need to document that stuff)

Thoes arguments should be pretty perfect for debugging. For the systemd boot, I’m already using a systemd efi-stub boot as opposed to grub, and now I’m wondering if that is part in the issue. I’ll give grub a go as that itself may be the problem, and poke around with the stage-1 root shell.

Thank you!

I wasn’t talking about systemd-boot; the boot loader shouldn’t really affect this problem at all. The systemd based initrd is a different thing. It’s a reimplementation of stage 1 using systemd for PID 1. It’s enabled with boot.initrd.systemd.enable.

Somehow I’d managed to tell boot.loader.systemd-boot.enable was distinct from boot.initrd.systemd.enable and still misinterpret your answer. – my fault : / I think the manual’s “You can add these parameters in the GRUB boot menu by pressing ‘e’” for some reason made me think things were grub exclusive…

I’ll give the systemd initrd a shot, thank’s for the advice and clarification, I’m as familiar with the different boot systems as I probably should be.

Thanks for your post. I encountered the exact same issue and struggled for a while. The first thing that helped me is the ability to break into a shell when it failed by adding this to my hardware-configuration.nix

 boot.kernelParams = [ "boot.shell_on_fail" ];

After doing that, what I noticed, is that the luks device specified in my boot.initrd.luks.devices did not exist. It had this: boot.initrd.luks.devices."luks-fe8d1cd8-4257-48e7-a099-ed3689930601".device = "/dev/disk/by-uuid/fe8d1cd8-4257-48e7-a099-ed3689930601";

There was no /dev/disk/by-uuid/fe8d1cd8.... at this point. I did have /dev/md0 though so I did cryptsetup open /dev/md0 root --type luks and put in the password and it worked fine.

So I modified my hardware-configuration.nix to just point to /dev/md0 and it worked

boot.initrd.luks.devices."luks-fe8d1cd8-4257-48e7-a099-ed3689930601".device = "/dev/md0";

After that I did a nixos-install --no-root-password and rebooted and it worked.

Hopefully you can try and see if this helps you.

That md0 device should have a UUID or a label or something for you to reference it with /dev/disk/by-{uuid,label}, which you should probably use.

Yes, it should, but unfortunately there was nothing in by uuid or by label, and the drive is labeled as ‘root’ too.

I wonder if this is a bug in our scripted stage 1. I also wonder if my suggestion to use boot.initrd.systemd.enable = true; would have fixed it.

i’m glad to break it and give it a shot, thats what we have generations for :slight_smile: give me a few

I moved it back to /dev/disk/by-uuid/fe8d1cd8-4257-48e7-a099-ed3689930601 and enabled boot.initrd.systemd.enable = true and it indeed booted fine - when asking for the password it gave the path above instead of /dev/md0 so i am confident it read it fine.

ok so it sounds like a bug in the scripted variant of stage 1. Interesting

Not sure what that means but if there is anything I can do to help let me know.