Nixos 24.05 upgrade no boot device

I just tried to update to nixos 24.05, here is my config GitHub - lukovdm/nix-config: Nixos config, this is for the barium configuration. I updated the inputs of my flakes and updated the lockfile and then did sudo nixos-rebuild switch --flake .#.

When I rebooted I got the message no boot device found. Luckily I have a install usb for nixos lying around and I booted into that. I tried to chroot into the installation and run nixos-rebuild --install-bootloader boot --flake .#barium. But nothing happened.

How can I make my laptop boot again?

The relevant parts of my configuration I think are:

  boot.initrd.availableKernelModules = [ "xhci_pci" "thunderbolt" "nvme" "usb_storage" "sd_mod" ];
  boot.initrd.kernelModules = [ "dm-snapshot" ];
  boot.kernelModules = [ "kvm-intel" ];
  boot.extraModulePackages = [ ];

  boot.extraModprobeConfig = ''
    options snd-hda-intel model=dell-headset-multi
  '';

  boot.kernelPackages = lib.mkIf (lib.versionOlder pkgs.linux.version "5.16") (lib.mkDefault pkgs.linuxPackages_latest);
  boot.kernelParams = [ "mem_sleep_default=deep" "nvme.noacpi=1" ];

  services.udev.extraRules = ''
    SUBSYSTEM=="pci", ATTR{vendor}=="0x8086", ATTR{device}=="0xa0e0", ATTR{power/control}="on"
  '';

  fileSystems."/" =
    {
      device = "/dev/disk/by-uuid/bbd0cdc3-5483-4fc4-9bd2-8fb0160ae951";
      fsType = "ext4";
    };

  fileSystems."/boot" =
    {
      device = "/dev/disk/by-uuid/C7BE-45C3";
      fsType = "vfat";
    };

  swapDevices =
    [{ device = "/dev/disk/by-uuid/1ecb9365-6cc6-4c9a-94df-35889a7547b4"; }];

  boot.initrd.luks.devices = {
    luksroot = {
      device = "/dev/disk/by-uuid/9d455e69-da5a-41e2-81f7-0db1c4b741b2";
      preLVM = true;
      allowDiscards = true;
    };
  };

  boot.loader.grub = {
    enable = true;
    device = "nodev";
    efiSupport = true;
    enableCryptodisk = true;
    # efiInstallAsRemovable = true;
  };
  boot.loader.efi.efiSysMountPoint = "/boot/EFI";
  # boot.loader.efi.canTouchEfiVariables = false;

  hardware.acpilight.enable = lib.mkDefault true;

  networking.hostName = "barium"; # Define your hostname.
  networking.useDHCP = false;
  networking.interfaces.wlp170s0.useDHCP = true;
  networking.networkmanager.enable = true;
  networking.networkmanager.wifi.scanRandMacAddress = false;
  networking.networkmanager.wifi.powersave = false;

  systemd.services.NetworkManager-wait-online.enable = false;

  powerManagement.cpuFreqGovernor = lib.mkDefault "powersave";
  hardware.cpu.intel.updateMicrocode = lib.mkDefault config.hardware.enableRedistributableFirmware;

This is – very likely – caused by a util-linux update. This update also regressed in other ways, so it was reverted, but the revert is currently in staging-next.

I debugged some time on this (partly with @Kloenk):

  • symptom is that the luks device doesn’t show up in /dev/disk/by-uuid anymore (nor in by-partlabel, by-partuuid)
  • explicitly udevadm test-builtin blkid /dev/nvme0n1p2 -a add reports Failed to probe superblocks: Operation not permitted

I haven’t yet opened a upstream issue, but will do soon.

digging through util-linux I suspect that this line is the -1 which the blkid builtin in udev reports. sadly it erases most information that I could not yet tell which exact chain is the one failing and how those chains work. strace does not report an error, so it’s likely some check in a chain that is unhappy.

I think I’m also affected by this issue. I’m able to rebuild/switch but then reboot fails. Choosing a previous generation works, what doesn’t is the couple of services that did DB migrations and can no longer be used on the old versions :frowning:

@mgdigital can you explain a bit more about your disk layout? Do you use LUKS, LVM or something different?

To be honest not sure. I’m on a Hetzner VPS and installed with nixos-infect on top of Ubuntu, then adapted the config from there. This is my hardware-configuration.nix:

{
  modulesPath,
  config,
  ...
}: {
  imports = [(modulesPath + "/profiles/qemu-guest.nix")];
  boot.loader.grub.device = "/dev/sda";
  boot.initrd.availableKernelModules = ["ata_piix" "uhci_hcd" "xen_blkfront" "vmw_pvscsi"];
  boot.initrd.kernelModules = ["nvme"];
  fileSystems = {
    "/" = {
      device = "/dev/sda1";
      fsType = "ext4";
    };
    "/mnt/vol-1" = {
      device = "/dev/disk/by-id/scsi-0HC_Volume_100489033";
      fsType = "ext4";
      options = ["noatime" "discard" "defaults"];
    };
  };

  # Set your system kind (needed for flakes)
  nixpkgs.hostPlatform = "x86_64-linux";
}

The issues I’m having with 24.05 seem more complex and I’m seeing some quite weird behaviour. I’ve tried a few cycles of switching back and forth between 23.11 and this is some of what I’ve observed:

  • switch from 23.11 → 24.05, all seems fine
  • reboot then works but I have complete lack of any network, e.g. pinging google.com gives me Name or service not known and trying to SSH in gives me Network is unreachable, despite sshd reporting listening (initially thought this was a Hetzner issue and opened a ticket with them)
  • subsequent reboots fail to find the boot device
  • select 23.11 generation, booting works, network works
  • rebuild switch back to 24.05, same behaviour with first lack of network then lack of boot device
  • back on the 23.11 generation, did another rebuild switch, still on 23.11, rebooted, no boot device, rebooted again, selected most recent 23.11 generation (the one that just failed), and it works

I’m calling it a day for now, am I the only one seeing these issues though? My system has been quite stable until I attempted the 24.05 upgrade…

that’s weird. It feels different to my bug (which is most probably in the LUKS(?) UUID detection of util-linux). For me not even the first reboot works. I cannot be sure without taking a deeper look, but I suggest you to look elsewhere. Sorry