NixOS went to systemd emergency mode

RCHG · January 26, 2020, 8:57pm

Hi all,

I had the following problem:

In my nixos hardware-configuration.nix I have several partitions that are loaded by label. They are the main ones plus extra partitions for backups.
I decided to format one of the partitions of backups but unfortunatelly I forgot to change the entry in the /etc/nixos/hardware-configuration.nix
After a boot now the systemd is not able to boot. My initial guess that it is going to systemd emergency mode because:

> systemctl show -property Requires local-fs.target
Requires:-.mount blabla.mount bloblo.mount

and one of these. let say bloblo.mount is not working as this partition is not there anymore (is not part of the system just a partition with backups).

Note that the systemd actually stops when it arrives to the systemd-timesyncd service (but this might be not important).

The main problem is that I can not do any nixos-rebuild in the terminal of systemd emergeny as the network seems not working. Also most of the files that could be edited, potentially, to solve the issue are read-only. So, I have no idea on how to have the system working again.

Any idea/help will be very very welcome!!!
Thanks!

RCHG · January 27, 2020, 8:49am

Short update:

It was not possible to change manually /etc/fstab, or use any systemd method to mask a specific target unit. The only good point is that the utilities to format partitions like mkfs.ext4 and those to change levels, like for example e2label, are working (in the systemd emergency terminal) so the problem was solved just by creating a partition in a disk with the label that the systemd is looking for (by the local-fs.target)

In this case it was lucky that the configuration has been done with /dev/disk/by-label/ as that allowed to define a partition with the old label and the system could boot. This solution is not general and in cases where the filesystem in hardware-configuration.nix is not defined by label it would more tricky to solve the issue.

bjornfor · January 27, 2020, 10:24am

Did you try to boot into older system from grub / systemd-boot? (I think that would be the preferred way to fix issues like this.)

RCHG · January 27, 2020, 11:41am

Well, I tried but unfortunately, all the previous NixOS configurations were created with partition that was removed and produced the problem.

julm · January 27, 2020, 12:33pm

Thank you RCHG for mentioning yet another way to save the day.
TL;DR: I now use systemd.enableEmergencyMode = false;

I’ve stumbled upon systemd’s emergency mode a few days ago after having mistakenly swapped the paths of the target mountpoint and of the source device, by writing:

fileSystems."/var/mail" = { device = "rpool/var/lib/dovecot"; fsType = "zfs"; };

Instead of:

fileSystems."/var/lib/dovecot" = { device = "rpool/var/mail"; fsType = "zfs"; };

Here are my notes which might help others :

Use journalctl -xb -p3 instead of just using the journalctl -xb recommended by the emergency shell, to directly see the errors causing the emergency.
Try to mask the failing mount by temporarily corrupting the Nix store (either /etc/systemd/system/ or /etc/fstab):

# mount -o remount,rw /nix/store
# systemctl mask var-mail.mount
Created symlink /etc/systemd/system/var-mail.mount → /dev/null.
# systemctl isolate multi-user.target
<<< Welcome to NixOS 19.09pre-git (x86_64) - ttyS0 >>>
mermet login:

Try nixos-rollback, but it may fail to reinstall GRUB correctly (grub-install failing on unknown filesystem), then after a reboot without the emergency: disabling grub, running nixops deploy, re-enable grub, then running again nixops deploy was enough.
systemd would ignore the failure to mount if the /etc/fstab line had the nofail option.
Apparently there is no reliable way to run a sshd in systemd’s emergency mode, therefore remote machines without out-of-band access should have systemd.enableEmergencyMode set to false.

RCHG · January 27, 2020, 2:26pm

This is idea to remount as rw the /nix/store is what I was looking for! But also the other tips you gave are very useful.

Thanks @julm

julm · January 27, 2020, 3:38pm

Just be well aware that manually corrupting the Nix store is an advice I’m not proud at all to give: it’s not an habit to gain but a really perilous and desperate measure to resort to when no other workaround is available, and which I always fix right after it is no longer needed. 'cause it can bite back pretty hard to a point where even nix-store --verify --check-contents --repair won’t help, especially if you delete files (eg. to free some space or inodes to let nix update its sqlite database), which I’ve learned the hard way here. So please use mount -o remount,rw /nix/store with great care.

julm · October 4, 2022, 7:18pm

Instead of corrupting the Nix store, people may use runtime unit files to override units by writing files to /run/systemd/system, eg.: systemctl mask --runtime foo.mount or systemctl edit --runtime foo.service

anticipatedfart · April 11, 2023, 8:24pm

Thank you so much for this. I was able to use it to save my system!