`nixos-rebuild switch` is failing to install grub, `/boot` partition disappeared

Hello :wave: ,

My issue manifested with nixos-rebuild switch erroring out with the following log:

$ sudo nixos-rebuild switch --flake .# --install-bootloader
building the system configuration...
updating GRUB 2 menu...
installing the GRUB 2 boot loader on /dev/sda...
Installing for i386-pc platform.
/nix/store/k16fsfwnccjmknzrrqhcmwm3l8g2p61d-grub-2.06/sbin/grub-install: warning: this GPT partition label contains no BIOS Boot Partition; embedding won't be possible.
/nix/store/k16fsfwnccjmknzrrqhcmwm3l8g2p61d-grub-2.06/sbin/grub-install: error: embedding is not possible, but this is required for RAID and LVM install.
/nix/store/lr12rz9yj3rkfbgkdlcj7d87wrsi972a-install-grub.pl: installation of GRUB on /dev/sda failed: Inappropriate ioctl for device
warning: error(s) occurred while switching to the new configuration

Looking around on the system, I noticed that my /boot partition disappeared(!).

$ lsblk
NAME          MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
fd0             2:0    1    4K  0 disk
sda             8:0    0  1.8T  0 disk
└─sda1          8:1    0  1.8T  0 part
  └─lvm-media 254:0    0  3.6T  0 lvm  /nix/store
                                       /
sdb             8:16   0  1.8T  0 disk
├─sdb1          8:17   0  1.8T  0 part
│ └─lvm-media 254:0    0  3.6T  0 lvm  /nix/store
│                                      /
└─sdb2          8:18   0   16G  0 part [SWAP]
$ sudo ls /boot
background.png  converted-font.pf2  grub
$ df -h /boot
Filesystem                Size  Used Avail Use% Mounted on
/dev/disk/by-label/nixos  3.6T  3.3T  113G  97% /

This is surprising to me as I should instead have the following setup (according to both my memories and the install script for that server):

  • Two disks of 2TB
    • One has three partitions
      • Boot partition
      • Media/root partition
      • Swap partition
    • The second disk has a single partition, to be used through LVM as a media/root partition with the other disk’s.

I have a few questions:

  1. How did I lose my /boot partition?
  2. Is that why nixos-rebuild switch fails?
  3. How do I recreate it, since this is a server I would like to try and avoid shutting it down in case I can never reboot it…
  4. Can I avoid this happening again in the future?

Refs:

Mhmmm I’m not sure exactly how it happened, but here’s how I fixed it:

  1. Notice that it looked like /dev/sda and /dev/sdb had switched between my initial install and the current state of the system, somehow…
  2. Made /dev/sdb1 bootable through parted /dev/sdb set 1 boot on, as it didn’t have that flag set.
  3. Changed boot.loader.grub.device to use /dev/sdb since that’s the one that should contain the boot partition.

At least nixos-rebuild switch finally worked.

If anybody has comments on what happened here, or how I can improve my config to avoid this in the future, I’d gladly take your advice.

I’ve had some boot confusion in the past that were caused by similar issue: sda and sdb switching. I’ve since swore to only rely on /dev/disk/by-id/ in boot.loader.grub.device.