Boot fails after new systemd stage 1

After the new systemd stage 1, I have encountered multiple problems:

  • After entering my passphrase for one of my luks encrypted disks, (1 of N) I get "A start job is running for Cryptography Setup for [DISK]. I later get prompted for the passphrase for the same disk again later.
  • My encrypted disks fail to mount, with this error, putting me in emergency mode
  • The same passphrase isn’t reused for both disks

This is my config. Is there a way to fix this, and would it be possible to switch back to the old stage 1?

Log from journalctl:

pr 21 16:30:09 cesar systemd[1]: Finished Create SUID/SGID Wrappers.
apr 21 16:31:39 cesar systemd[1]: dev-mapper-hdd1.device: Job dev-mapper-hdd1.device/start timed out.
apr 21 16:31:39 cesar systemd[1]: Timed out waiting for device /dev/mapper/hdd1.
apr 21 16:31:39 cesar systemd[1]: Dependency failed for File System Check on /dev/mapper/hdd1.
apr 21 16:31:39 cesar systemd[1]: Dependency failed for /data/hdd1.
apr 21 16:31:39 cesar systemd[1]: Dependency failed for Local File Systems.
apr 21 16:31:39 cesar systemd[1]: local-fs.target: Job local-fs.target/start failed with result 'dependency'.
apr 21 16:31:39 cesar systemd[1]: local-fs.target: Triggering OnFailure= dependencies.
apr 21 16:31:39 cesar systemd[1]: data-hdd1.mount: Job data-hdd1.mount/start failed with result 'dependency'.
apr 21 16:31:39 cesar systemd[1]: systemd-fsck@dev-mapper-hdd1.service: Job systemd-fsck@dev-mapper-hdd1.service/start failed with result 'dependency'.
apr 21 16:31:39 cesar systemd[1]: dev-mapper-hdd1.device: Job dev-mapper-hdd1.device/start failed with result 'timeout'.
apr 21 16:31:39 cesar systemd[1]: dev-mapper-hdd2.device: Job dev-mapper-hdd2.device/start timed out.
apr 21 16:31:39 cesar systemd[1]: Timed out waiting for device /dev/mapper/hdd2.
apr 21 16:31:39 cesar systemd[1]: Dependency failed for /data/hdd2.
apr 21 16:31:39 cesar systemd[1]: data-hdd2.mount: Job data-hdd2.mount/start failed with result 'dependency'.
apr 21 16:31:39 cesar systemd[1]: Dependency failed for File System Check on /dev/mapper/hdd2.
apr 21 16:31:39 cesar systemd[1]: systemd-fsck@dev-mapper-hdd2.service: Job systemd-fsck@dev-mapper-hdd2.service/start failed with result 'dependency'.
apr 21 16:31:39 cesar systemd[1]: dev-mapper-hdd2.device: Job dev-mapper-hdd2.device/start failed with result 'timeout'.
apr 21 16:31:39 cesar systemd[1]: systemd-ask-password-console.path: Deactivated successfully.
1 Like

NixOS Manual says you get the old scripted initrd back with boot.initrd.systemd.enable = false (but it’s deprecated).

Did you follow the steps mentioned in Breaking changes announcement for unstable - #128 by ElvishJerricco?

1 Like

I wonder if your console.keyMap is failing to apply in systemd stage 1. We have code that’s supposed to make that work, but maybe it isn’t. Your symptoms sound just like what would happen if you entered the passphrase wrong (i.e. the disk doesn’t unlock, the passphrase doesn’t get stored for reuse on a different disk, and the lack of unlocked disks makes the file systems that depend on them fail to mount).

1 Like

I feel like if we can’t unlock a disk with a neededForBoot file system on it, then the initrd shouldn’t just shrug and keep going. It should keep re-prompting for the password until the person at the keyboard gets it right. Possibly with a configurable “no more tries” counter which, when you type the password wrong that many times in a row, causes the computer to turn itself off again and refuse to boot at all for some (also configurable) number of hours.

Correct. And that happens. It fails. These were not neededForBoot file systems though, so the failure mode is a little different.

It’s supposed to prompt for the password multiple times, and OP noted that they did get prompted at least one more time. Failing after some tries is a reasonable outcome though, dropping into emergency mode. There is a bit of a complication in this case though: since the disk is unlocked in stage 1 and mounted in stage 2, the cryptsetup part failed in stage 1 non-fatally, and the mount in stage 2 timed out looking for the device. If these two things were done in the same stage, I think the failed cryptsetup would fail the device unit which would fail the mount unit which would drop into emergency mode. Perhaps we can make this work cross-stage better, but FWIW my recommendation has always been to use boot.initrd.luks only for stage 1 file systems, and to use /etc/crypttab for stage 2 file systems (and really boot.initrd.luks is just a frontend for stage 1 /etc/crypttab in systemd stage 1).

It is configurable with the tries= crypttab option.

This is already configurable via changing what emergency.target does.

2 Likes

Is there anywhere you have these kinds of recommendations collected?

That sounds snarkier than it’s intended, I fully appreciate that that takes work that you may not have gotten to (or want to do), but there are enough details in setting up the stage 1/2 that a bit of a write-up would be helpful. I certainly don’t have a clear picture of what’s intended to be done when at all times. The manual only covers handling one disk if I read that correctly.

1 Like

I just tested entering a completely wrong password, and it just continued to stage 2, without reprompting. The first reprompt seems to be caused by it prompting before cryptsetup has been initialized, which it then does, and prompts again after.

I have looked at the breaking changes, and I do already comply with the requirements for the luks config.

that’s just not how it works though. The prompt is caused by systemd-cryptsetup asking for a password, and the console password agent seeing that request and making the prompt. There’s no way for the prompt to appear before systemd-cryptsetup is “initialized”.

I think there may be a bug in systemd-cryptsetup w.r.t. the number of prompts you get before it fails, so I wouldn’t be terribly surprised if this was actually the same behavior as what you saw when you tried to enter the passphrase correctly. The only difference was how many times you were prompted, right? So I’m still wondering if it’s a keymap problem. This should be easy to verify. You can press e in the boot menu to edit the boot params and add rd.rescue SYSTEMD_SULOGIN_FORCE=1, and that should put you into a shell during stage 1 where you can type at the prompt and see what happens. That should give you an idea if the keymap is working.