Suddenly unable to boot into user session on ephemeral ZFS

While not changing anything at all about my filesystems or ZFS configuration, suddenly my system will not let me boot/log in anymore.

I just started to notice it today.

The last working boot entry is 26.05.20251221.a653104, the one after that (26.05.20260105.9f0c42f up to the last one 26.05.20260111.ffbc9f8) already do not boot correctly anymore.

I encounter the following misbehaviour:
After entering my passphrase for the luks encrypted disk in stage 1, the root ZFS pool zroot is imported, the root dataset is mounted on /, the nix dataset on /nix, and the persistent dataset as well.

The last message in stage 1 is

mount: /mnt-root/run: filesystem was mounted, but failed to update userspace mount table.

Then in stage 2 some errors are shown (Failed to start Accounts Service, Failed to start Name Service Cache Daemon (nsncd), Failed to start Home Manager environment for user, …)

Some errors I noticed:

avahi-daemon[2948]: Failed to create runtime directory /run/avahi-daemon/.
(s-daemon)[3178]: accounts-daemon.service: Failed to set up mount namespacing: /run/systemd/seats: No such file or directory
(s-daemon)[3178]: accounts-daemon.service: Failed at step NAMESPACE spawning /nix/store/myqpak8mqishg43g7jgyghgdfk8ma277-accountsservice-23.13.9/libexec/accounts-daemon: No such file or directory
systemd[1]: Failed to start Accounts Service.
nsncd[2956]: Jan 12 13:35:31.637 INFO started, config: Config { ignored_request_types: {}, worker_count: 8, handoff_timeout: 10s }, path: "/var/run/nscd/socket"
nsncd[2956]: Error: Read-only file system (os error 30)
systemd[1]: home-manager-u.service: Main process exited, code=exited, status=73/CANTCREAT
systemd[1]: home-manager-u.service: Failed with result 'exit-code'.
systemd[1]: Failed to start Home Manager environment for u.
systemd-tmpfiles[4106]: Detected unsafe path transition / (owned by u) → /home (owned by root) during canonicalization of home/u.
systemd[4060]: systemd-tmpfiles-setup.service: Main process exited, code=exited, status=73/CANTCREAT
systemd[4060]: systemd-tmpfiles-setup.service: Failed with result 'exit-code'.
systemd[4060]: Failed to start Create User Files and Directories.
systemd[4060]: nixos-activation.service: Main process exited, code=exited, status=73/CANTCREAT
systemd[4060]: nixos-activation.service: Failed with result 'exit-code'.
systemd[4060]: Failed to start Run user-specific NixOS activation.

Then it will continue stage 2 but never show a login prompt, all the while being stuck at started session for user.

Switching to another TTY allows me to log in, but it rejects with a short error message “System Error”, reprompting to the same screen.

I have only noticed this today. I also think I might have rebooted in the time between the last working generation and the last generation, and not had this issue, curiously.

I have the following hardware configuration:

boot.initrd.luks.devices."luks".device = "/dev/disk/by-label/luks";

boot.initrd.postResumeCommands = lib.mkAfter "zfs rollback -r ${zfs_root}/yeet@yeeted";

fileSystems =
  let
    zfs_device = dataset: {
      device = "${zfs_root}/${dataset}";
      fsType = "zfs";
      neededForBoot = true;
    };
  in
    {
      "/boot".device = "/dev/disk/by-label/boot";
    }
    // lib.mapAttrs (_: zfs_device) {
      "/" = "yeet";
      "/u" = "keep";
      "/nix" = "nix";
    };

All zfs datasets are shown as mountpoint legacy when running zfs list, and they were created as such.

I am really lost as to why the generations are not booting after that last working one.

1 Like

Hi, I’ve got the same problem with ephemeral btrfs. Did you find a solution?

Unfortunately not, yet. Right now I don’t even know how to debug this, either.

I think this issue has gone away for me now.

It seems to be a consequence of the impermanence module mismanaging strict persistence volume permissions: Early boot: / ownership becomes user-owned, systemd-tmpfiles fails, Wayland sessions broken (impermanence) · Issue #295 · nix-community/impermanence · GitHub

I updated impermanence, and could boot again.

1 Like