NixOS Luks/ZFS installation on ASUS chromebook hangs on boot

Last year, I got NixOS to boot from an SD card on my Asus chromebook. It was full-disk encrypted using LUKS, filesystem was ZFS (but not using ZFS native encryption.) The boot & header weren’t encrypted, and were installed on a separate usb stick. I’m attempting to do something similar on a new asus chromebook, however I’m running into a weird issue where it hangs on boot. This new setup should be simpler (no detached header, boot is on the same SD as the filesystem.) Here’s some information:

Hardware: Asus Chromebook, 64 bit, Intel celeron CPU N3350. BIOS is coreboot but dmidecode -t bios lists no version. Claims PCI is supported, PC Card (PCMCIA) is supported, BIOS is upgradeable, ACPI is supported, BIOS revision is 4.0, Firmware revision is 0.0, SMBIOS 2.7 is present.

I’ve done the typical thing one does when they intend to install linux on their chromebook: enabled legacy boot, installed the mrchromebox firemware (the RW_LEGACY boot version, not the one that requires physical access to the eMMC memory,) and have successfully booted from USB using my NixOS installation media.

The target for the installation is an SD card inserted into the chromebook. I unfortunately don’t have my old instructions from a year ago, so I’m doing this from memory.

When I boot from the new installation, it gets to the grub menu where I can select which nixos configuration to boot, but once I hit enter, it displays the “NixOS” screen and doesn’t even let me open the disk. Grub doesn’t report any error messages. Journalctl doesn’t log any journal entries after boot attempts (because the disk never even gets decrypted.)

Here are the installation steps I’ve taken for this chromebook:


parted /dev/mmcblk1 -- mklabel msdos
parted /dev/mmcblk1 -- mkpart primary fat32 0% 2%
parted /dev/mmcblk1 -- mkpart primary 2% 100%

mkfs.vfat -F32 /dev/mmcblk1

cryptsetup luksFormat /dev/mmcblk1p2
cryptsetup luksOpen /dev/mmcblk1p2 enc-pv

pvcreate /dev/mapper/enc-pv
vgcreate vg /dev/mapper/enc-pv
lvcreate -L 2G -n swap vg
lvcreate -l  '100%FREE' -n root vg

mkswap -L swap /dev/vg/swap

zpool create -f -O atime=off -O xattr=sa -O mountpoint=none "rpool" /dev/vg/root
zfs create -p -o compression=on -o mountpoint=legacy "rpool/local/root"
zfs  create -o compression=on -o mountpoint=legacy "rpool/local/nix"
zfs  create -o compression=on -o mountpoint=legacy "rpool/safe"
zfs  create -o compression=on -o mountpoint=legacy "rpool/safe/home"
zfs  create -o compression=on -o mountpoint=legacy "rpool/safe/persist"

mount -t zfs "rpool/local/root" /mnt
mkdir /mnt/nix
mount -t zfs "rpool/local/nix" /mnt/nix
mkdir /mnt/home
mount -t zfs "rpool/safe/home" /mnt/home
mkdir /mnt/persist
mount -t zfs "rpool/safe/persist" /mnt/persist
mkdir -p /mnt/boot
mount /dev/mmcblk1p1 /mnt/boot

nixos-generate-config --root /mnt

wpa_supplicant -B -i wlp1s0 -c <(wpa_passphrase '[network name]' '[network password]') &

nano /mnt/etc/nixos/configuration.nix

nixos-install --root /mnt

The configuration file is edited to include the location of the LUKS container, and the hostId is set (for ZFS.)

Some settings in configuration.nix which could be relevant:

boot.loader.grub.enable = true;
boot.loader.grub.device = "/dev/mmcblk1";
boot.loader.grub.enableCryptodisk = true; 
boot.loader.grub.zfsSupport = true;
boot.loader.grub.version = 2;
  boot.initrd.luks.devices.luksroot = { 
     device="/dev/disk/by-uuid/<redacted>"; 
     preLVM =true; 
     allowDiscards =true;
  };
  networking.hostId = "<redacted>";
  services.xserver ={
      enable = true;
      displayManager.lightdm.enable = true;
      desktopManager.xfce.enable = true;
  };

Pretty sure enableCryptodisk and zfsSupport are unnecessary. I’ve attempted booting with them enabled and with them disabled, and it makes no difference. I’ve also attempted re-installing with a gpt partition scheme and systemd, and that also made no difference. I’ve also toggled efiSupport and efiInstallAsRemovable on and off, and it made no difference.

I also tried booting a default config, like a basic nixos install with no luks, no zfs, just a basic ext4, and no change. It hangs on boot before even reaching stage 1.

I’m puzzled. Any ideas?

As I said in the last thread, I’m wondering if you’re just not getting a console. Try installing NixOS without any disk encryption so it can boot without any password input during stage 1, just to see if it makes it to a login screen. It might be that it needs to load certain drivers during stage 2 before it can output anything on the screen.

Yeah, I tried that.

I also tried booting a default config, like a basic nixos install with no luks, no zfs, just a basic ext4, and no change. It hangs on boot before even reaching stage 1.

@ElvishJerricco

The kernel is successfully being loaded: when I edit the kernel path in grub to lead somewhere that doesn’t exist, I get an error message.

The freeze seems to happen immediately after the kernel is loaded: when I edit the initrd path in grub, it still freezes, with no error message.

It doesn’t seem to be the console= kernel parameter either: none of tty1, ttyS0, or ttyS1 lead to any change. Even with a default installation with ext4 and no encryption, journalctl doesn’t have any logs.

Maybe initrd doesn’t have the drivers necessary to find your drive? Not sure how to check that.

I’ve switched to a USB drive instead of an SD card. This time, I can successfully boot with ext4 or zfs, as long as I don’t use encryption or vgs. If I use either one, the boot hangs as it did earlier. The commands and config are otherwise he same. any idea wha might be going on there?

Driver for the sd card reader not in initrd?

Trying to use /dev/sd? rather than /dev/mmcblk??

Doesn’t matter anymore now that I’ve switched to using a USB device, and the encryption and vgs make the problem happen.

@ElvishJerricco Everything else seems to be acting the same as with the SD card; journalctl still has no logs. Do you know of any modules that could help access /dev/mapper/enc-pv ? I suspect that’s the problem given how both encryption and using vgs cause the boot to hang.

If your machine_id changes at boot, it can look like old logs are missing. See if you have more than one dir (machine_id) in /var/log/journal/.

I have found using zfs for root from an sdcard to be a bit fiddly to get working. When this happens there are clear messages on the console about not being able to find the pool.

From reading nix source code, I see that the zpool import must succeed within a limited time or nix gives up. This could be a problem with slow media. It also might be affected by services.zfs.expandOnBoot not being set to disabled.

Where nix looks for the pool could be another source of problems. The boot kernel must have appropriate modules available.

By default nix looks for the pool in boot.zfs.devNodes = "/dev/disk/by-id". That directory must be populated by the initial boot process before it goes looking for the root filesystem. If you drop to a console when the boot fails, you can see if disk/by-id contains suitable entries.

Here are bits of my config you may find useful. Notice the boot debug lines commented out.

          {
            system.stateVersion = "22.11";
            networking.hostName = "new_host";
            networking.hostId = "12345678"; # a unique value required by zfs
            boot.loader.systemd-boot.enable = true;
            boot.loader.efi.canTouchEfiVariables = true;
            #boot.loader.systemd-boot.consoleMode = "auto";
            #boot.consoleLogLevel = 7;
            #boot.kernelParams = [ "boot.shell_on_fail" ];
            #users.users.root.hashedPassword =
            #  "$y$j9T$jqQVIVX4Wsld9X.q21V1e/$NF.3vjkeTUjGYF4vIykn8WfS8Uj8oiraEV/ipCAqRoC";
            boot.initrd.availableKernelModules = [
              # what _might be_ needed to mount root
              "nvme"
              "usb_storage"
              "sd_mod"
              "rtsx_pci_sdmmc"
            ];
            boot.zfs.devNodes =
              "/dev/disk/by-partlabel"; # coordinated with imaging script
            services.zfs.expandOnBoot =
              "disabled"; # `all` breaks boot on `new_host` (possibly bad hardware)
            hardware.enableRedistributableFirmware =
              nixpkgs.lib.mkDefault true; # from nixos/modules/installer/scan
            # must remount before nixos build: `sudo mount -o remount,rw /boot`
            fileSystems."/boot" = {
              device = "/dev/disk/by-partlabel/efi";
              options = [ "ro" "noexec" "nodev" "nofail" ];
            };

I recall reading somewhere (possibly NixOS Manual) that zpool import for root can fail because the device driver is not yet loaded when nix does the import. I’d expect this to be relevant when root and boot are on different devices, but maybe luks contributes. The fix was to specify the dependency somewhere, but I forget where – maybe a fileSystems option or some systemd module.

I’ve tried putting in all the options in the config you gave that don’t have to do with EFI, apart from the boot.zfs.devNodes option, which I’m not sure how to make work with BIOS, and the hardware.enableRedistributableFirmware option, because I’m not sure how to properly import nixpkgs.

When I look at the device using nixos-enter, there is no such folder as var/log/journal, even though the installation medium has one. Setting the log level to 7 and adding the boot.shell_on_fail parameter didn’t change anything.

Do you see a grub boot menu when you boot?

Do you see any boot messages on the console after grub kicks off the boot?

If not, do you know if your boot process is configured to show messages?

If it isn’t, can you configure it to show messages?

Being able to see what messages are last reported would be helpful. Probably this is the most important to understand how far the boot process is getting.

boot.zfs.devNodes tells the nix stage 1 boot script where to look for a zfs pool when it is searching for the root fs. BIOS is pretty much uninvolved at this point. Grub also should be out of the picture, if I understand correctly.

hardware.enableRedistributableFirmware is just a regular nixos option, same as boot, services, and all the rest. It normally gets set by hardware-configuration.nix through the line imports = [ (modulesPath + "/installer/scan/not-detected.nix") ];. Probably you can ignore it.

var/log/journal will be created as needed when journald is running. Initially journald will write its journal to a tmpfs volume. Sometime after the rootfs is available, journald will migrate the tmpfs journal to /var/log/journal. (I think this is early in boot stage 2.) You can see entries for this in the boot log.

boot.shell_on_fail kernel parameter is actually handled by the init script. It basically give the init script permission to offer a shell when it fails. I’m not sure how this might be different between grub and systemd-boot.

I do see a grub boot menu. I can edit the grub configuration from there, or edit the grub command line.

I don’t see any boot messages on the console after grub kicks off the boot, but it should be configured to show messages: if I change the path to the kernel to a nonexistent one, I get error messages about that, and can then return to the grub menu.

When I didn’t have encryption or vgs enabled, the boot process lingered at the same screen for a few seconds with no messages, and then I saw a bunch of messages afterwards.

I’ve tried enabling verbose boot (which is already enabled by default) and putting grub in text-only mode; this didn’t make any new messages appear. Do you know what other settings I might be able to use to see boot messages? I suspect that the device is freezing and not showing any messages at all; it similarly freezes when I boot it from the grub command line.

Based on the lack of a journal folder, it seems like the rootfs is never created.

Given that you see evidence of similar misbehavior when booting without encryption, I suggest we focus on diagnosing and fixing that first. Hopefully doing so will get us more diagnostic messages.

If you look at the journal for a successful boot (journalctl -b), are the first entries from kernel startup or sometime later?

My kernel startup looks like

Mar 18 11:15:51 farm kernel: Linux version 6.1.19 (nixbld@localhost) (gcc (GCC) 11.3.0, GNU ld (GNU Binutils) 2.39) #1-NixOS SMP PREEMPT_DYNAMIC Mon Mar 13 09:21:32 UTC 2023
Mar 18 11:15:51 farm kernel: Command line: initrd=\efi\nixos\zbj7lwxncjw96b0y1qd4l51i6x145xh9-initrd-linux-6.1.19-initrd.efi init=/nix/store/s1rwjcsclqh10g7vp8a0hfz814zhfgba-nixos-system-farm-22.11.20230...

dmesg reports the same info but with date replace with [ 0.000000].

From what you’ve said so far, I expect your logged boot messages to be normal. If so, this would suggest the kernel is unable to find a suitable output device when it initially starts. Then, later, when the kernel switches from its initial console output, messages begin showing on the screen.

On my system, I see the kernel switch consoles several times (search your boot log for Console).

[    0.383208] Console: colour dummy device 80x25
[    0.383235] printk: console [tty0] enabled
...
[    1.509197] efifb: probing for efifb
[    1.509208] efifb: showing boot graphics
[    1.511045] efifb: framebuffer at 0x4000000000, using 13256k, total 13254k
[    1.511046] efifb: mode is 2256x1504x32, linelength=9024, pages=1
[    1.511047] efifb: scrolling: redraw
[    1.511047] efifb: Truecolor: size=8:8:8:8, shift=24:16:8:0
[    1.511089] fbcon: Deferring console take-over
[    1.511089] fb0: EFI VGA frame buffer device
...
[    1.541187] Run /init as init process
[    1.541188]   with arguments:
[    1.541189]     /init
[    1.541190]   with environment:
[    1.541190]     HOME=/
[    1.541190]     TERM=linux
[    1.542738] fbcon: Taking over console
[    1.542775] Console: switching to colour frame buffer device 282x94
[    1.560794] stage-1-init: [Sat Mar 18 18:15:49 UTC 2023] loading module zfs...
...
[    4.526028] i915 0000:00:02.0: [drm] VT-d active for gfx access
[    4.526167] Console: switching to colour dummy device 80x25
[    4.526199] i915 0000:00:02.0: vgaarb: deactivate vga console
[    4.526289] i915 0000:00:02.0: [drm] Using Transparent Hugepages
[    4.528757] i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=io+mem
[    4.530076] mei_hdcp 0000:00:16.0-b638ab7e-94e2-4ea2-a552-d1c54b627f04: bound 0000:00:02.0 (ops i915_hdcp_component_ops [i915])
[    4.531179] i915 0000:00:02.0: [drm] Finished loading DMC firmware i915/tgl_dmc_ver2_12.bin (v2.12)

If I recall correctly, the kernel’s initial console output device is provided by the boot loader. Then, after initializing memory and a single cpu core, the kernel switches console to its built-in framebuffer. Later, after bringing up a graphics processor, the kernel switched the console to the graphics device.

So maybe your kernel doesn’t understand the console device provided by grub. I’d be interested to know how using systemd-boot changes things. (Trying systemd-boot might be easy, given that you already are booting from USB.)

You also might try booting into single-user-mode, just in case that reveals any more clues.

Here are some references I used to refine my understanding of the boot process

The first entries in journalctl for a successful boot (journalctl --system worked, but journalctl -b didn’t) are in fact from kernel startup:

Mar 21 00:40:57 nixos kernel: microcode: microcode updated early to revision 0x48, date =  2021-11-16
Mar 21 00:40:57 nixos kernel: Linux version 5.15.89 (nixbld@localhost) (gcc (GCC) 11.3.0, GNU ld (GNU Binutils) 2.39) #1-NixOS SMP Wed Jan 18 10:48:59 UTC 2023
Mar 21 00:40:57 nixos kernel: Command line: BOOT_IMAGE-(hd0,msdos1)//kernels/[long string of letters and numbers]-linux-5.15.89.bzImage init=/nix/store/[another long string]-nixos-system-nixos-22.11.1777.cdead16a444/init nohibernate loglevel=4

The first two lines of the log about the colour dummy console are the same as yours (but with the same timestamp as above). I don’t have any lines about deferring the console takeover. The part that starts with run /init ends with fbcon: Taking over console, but there’s no line saying Console: switching to colour frame buffer device 282x94 befor the stage-1-init section begins.

As for the last set of lines, I instead have (starting basically the same way your example does:

vgaarb: deactivate vga console
Invalid PCI ROM data signature: expecting 0x52494350, got 0xe936aa55
[drm] Failed to find VBIOS tables (VBT)
vgaarb: changed VGA decodes: olddecodes=io+mem, decodes=io+mem:owns=io+mem
[drm] Finished loading DMC firmware i915/bxt_dmc_ver1_07.bin (v1.7)

Later on, I have:

Mar 21 00:41:01 nixos kernel:  NET: registered PF_PACKET protocol family
Mar 21 00:41:02 nixos kernel: fbcon: i915drmfb (fb0) is primary device
Mar 21 00:41:02 nixos kernel: Console: switching to colour frame buffer device 170x48

I’ve already tried changing the console provided by grub; it didn’t seem to help, and I’m not sure if that’s because I haven’t found the right value or if it’s just not the problem. For the successful boot without encryption, pressing e to look at the config didn’t show me any console= value at all. As for systemd-boot, do you know any way to make it compatible with BIOS (SeaBIOS in particular) rather than EFI?

I’m at a loss. I’ve done a bit more reading but not much useful. At this point I’m grasping for anything which might drop a new clue.

I think that grub merely exposes a console device provided by the bios. Maybe there are some bios settings which could be relevant.

GRUB - ArchWiki mentions that a kernel module can stop the boot process; it suggests blacklisting the offending module. Maybe you’d have some luck trimming availableKernelModules to a bare minimum.

That wiki page also mentions that grub needs to preload lvm modules, but I’d expect nix to have configured that. It also mentions grub’s support for LUKS2 is limited, if that happens to matter to you.

You are correct that systemd-boot needs EFI. It might be possible to chain-load it from grub, but I don’t think that would help much. (I suspect that systemd-boot is mostly a minimal linux kernel supporting only the necessary devices specified in the UEFI specs.)

Hopefully someone else can offer some new ideas.

FYI systemd-boot is essentially just a UEFI chain loader. It’s not a minimal linux kernel at all; it’s just a UEFI application that uses UEFI services to read some config files, present a boot menu, and load and launch the OS kernel. So it supports basically exactly whatever your system’s UEFI implementation supports. Which is to say, @grew_three_sizes can’t use it.

1 Like

It occurred to me last night that qemu runs SeaBIOS. I wonder if it’d be worth trying to boot your USB drive in qemu. This might be an easier way to examine the boot process and debug things like booting encrypted root.

From what I read in SeaBIOS - coreboot, SeaBIOS is a thin layer atop coreboot. It can expose VGA from coreboot or from a different “option ROM”. Possibly it also can be configured to map VGA to a serial port.

If your encrypted root requires a password entry, then you have to solve the console problem somehow. Mapping console to a serial port or ssh connection could work.

But for initial testing, you could configure the encrypted root to decrypt using a key file – maybe in the initrd image. If this works, at least you’d know that the console problem is the only one blocking boot.

@grew_three_sizes Is there any particular reason you are not just using native ZFS encryption?