I’m trying to understand the workings of hibernation on encrypted swap in Linux in general and NixOS in particular, and would greatly appreciate if someone could clarify my understanding on a few points. I am terribly sorry in advance if I have completely misunderstood the whole thing and none of my questions make sense.
Kernel documentation on the topic warns against mounting any filesystems between hibernation and resuming. At the same time, documentation for fileSystems.<name>.neededForBoot NixOS option indicates that at least /nix/store is mounted in the initial ramdisk. Am I to understand that this doesn’t happen when resuming?
If so, how does that interact with swapDevices.*.encrypted.keyFile NixOS option? Its documentation suggests that all of the neededForBoot filesystems are mounted prior to consulting this file; am I to understand that this, too, doesn’t happen when resuming from hibernation, and so the swap partition used for hibernation cannot be encrypted with a key file?
But it can be encrypted with a password, right? As in, if I just set swapDevices.*.encrypted.enable, .blkDev and .label, and set up encrypted partition accordingly, hibernation should (at least in theory) work and not leave me at a risk of data loss implied in p.1? Arch wiki article on the topic suggests adding custom mkinitcpio hook — is that relevant for NixOS?
On a somewhat tangential note, boot.resumeDevice option documentation seems to indicate that swap devices should be tried automatically, but on my (normally booted) system there are no resume= parameters in /proc/cmdline, and /sys/power/resume is 0:0, which are the two ways the aforementioned kernel documentation page suggests for specifying resume device (I do have swap enabled, as verified by lsblk). Is this done through some separate mechanism, or should I disregard the documentation and specify boot.resumeDevice manually? Or the relevant kernel options are somehow provided only if there is a hibernation image?
I suppose some of these questions are answerable with a bit of experimentation on my end, but after seeing stern warnings about data loss in kernel docs I am a bit anxious about blindly trying things without checking my understanding first.
So I did what I should have done from the beginning and poked around the source, and I think I have found all the answers; I would still be grateful if someone with actual knowledge about the relevant parts of the boot process could double-check my conclusions.
After that stage 1 init iterates over all configured swap devices, and checks each of them in turn for hibernation image. Notably, at that point, encrypted-with-a-password swap devices are already unlocked and therefore eligible.
Yes, neededForBoot filesystems are not mounted when resuming, because resuming logic comes before mounting them in the init sequence.
Swap partitions locked with a key file are simply never searched for hibernation images, because they are unlocked too late for that.
I will need to actually try that when I have the time again, but it seems that it should just work, without any additional hooks.
The last option; hibernation images are detected independently of the kernel resume= option, and the kernel is made aware of them only if/when they are found.
I wonder if boot.initrd.systemd.enable = true; fixes this situation somewhat.
If resume works like this, why (if I don’t specify keyfile for the swap) does / still get mounted (I get asked for the passphrase by bcachefs) before the resumption happens?
Some of these conclusions aren’t exactly right. They’re basically right if you’re using scripted initrd and the fileSystems/swapDevices.*.encrypted options. But you can configure LUKS devices with boot.initrd.luks.devices, and in that case key files will work for encrypted hibernate devices as long as something puts the key file in the initrd tmpfs before it’s needed (e.g. boot.initrd.secrets, though that one is dangerous).
With systemd initrd this all gets broadly simpler. Every encrypted device using a key file will be decrypted whenever the file systems containing that key file have been mounted. So if the key file isn’t on any file system and is provided in the initrd tmpfs some other way, then it can be used for hibernation. Otherwise, it’ll wait for the file system containing the key to be mounted, which won’t happen until after hibernate-resume has failed to resume.
@qm3ster Bcachefs is different because bcachefs-based encryption doesn’t use LUKS. Being prompted for a bcachefs passphrase doesn’t mean the bcachefs file system is being mounted. We ask for the passphrase in postDeviceCommands, before we mount file systems. We should arguably move this to postResumeCommands, because swap can’t be stored on bcachefs, so there’s no point in decrypting bcachefs file systems before attempting to resume from hibernation, and we should probably do something similar for the systemd ordering for systemd initrd.
boot.resumeDevice = "/dev/mapper/swap";
boot.initrd.systemd.services.unlock-swap = {
unitConfig = {
Description = "Unlock LUKS swap after root is mounted but before resume";
# Critical: Insert before systemd tries to resume
Before = [ "systemd-hibernate-resume.service" ];
# Ensure root is mounted first
After = [ "sysroot.mount" ];
Requires = [ "sysroot.mount" ];
};
serviceConfig = {
Type = "oneshot";
# Keyfile path is relative to MOUNTED root (/sysroot)
ExecStart = "${pkgs.cryptsetup}/bin/cryptsetup -v open --key-file /sysroot/etc/cryptkey.d/swap.key /dev/disk/by-uuid/1f615cad-a4ee-4e27-9663-ff932557a82e swap";
ExecStartPost = "${pkgs.util-linux}/bin/swapon -v /dev/mapper/swap";
};
# Force this to run in the initrd before handing off to stage 2
wantedBy = [ "initrd.target" ];
};
but mounting the suspended filesystem read-write corrupted it and I ran away screaming.
This is a dependency cycle, and for good reason. sysroot.mountalways must come after systemd-hibernate-resume.service (thanks to their respective orderings against local-fs-pre.target), so these constraints you have break the ordering and systemd will likely do something nonsensical to try to fix it. But even if you did something like mounting the drive manually to get the key and unmounting it or something, there's a *reason* sysroot.mountcomes aftersystemd-hibernate-resume.service`; file systems break when they’re mounted before hibernation is resumed. There just isn’t a way to get the keyfile for swap out of a file system like this.
What I recommend instead is trying to get the swap and the root FS to use the same common secret. This will be pretty DIY, since the bcachefs unlock logic in NixOS isn’t exactly designed for that kind of usage, but you can disable the upstream unlock logic and make your own.