My goal is pretty simple, but for some reason seems to be often left out. I want to configure a server that is online 24/7, does automatic updates and reboots without me needing to enter a password, while having a LUKS encrypted disk.
Encryption is non-negotiable, but implementing such a system of course has caveats. You will weaken the security by not using a wired keyboard to input the secret (password) that is then used to unlock the decryption key for the disk.
Afaik there are 2 approaches, one of which I “came up with myself”
Traditionally, one would run dropbearssh in the initrd. This means you have a network facing service running before the system has booted. From this point on, an admin could log in and enter a password, or another machine in LAN could send the password in a different way. I have seen implementations, they seem good but I want it non-interactive and also have security concerns.
You generate a password on the running system, store it in a region that is not encrypted and have the initrd attempt to decrypt the disk with that password. This does not expose a half-baked system to the internet and does not require manually entering (thus allowing daily updates for example).
I would like to implement number 2.
The idea
somehow hook into system.autoUpgrade and execute a command after a successful upgrade. Problems here:
there is no native option to add an ExecPost to the service (it is a systemd service right?
reimplementing the service as a regular systemd service would work, but the detection if a reboot is needed seems complicated?
Generate a keyfile dd if=/dev/urandom ..., somewhere that only root (which I assume runs the updater service) can access it.
I was not able to configure passwordless nixos-rebuild for sudo or sudo-rs for some reason, and at least the non-ng variant was not polkit aware either. So it has to be root I guess.
Use that known key to generate an additional one which is stored in /boot/keyfile.
Reboot once successful
In the initrd configure luks to attempt to unlock the disks using a keyfile at that position
If successful, the system boots to stage 2. Afaik you can add another script here, which would run as root and remove that key from luks. The keyfile doesnt need to be deleted as it is not valid anymore
And this should be able to work forever? It has a small attack window where an attacker would need to unplug a system while it reboots and before it can boot and remove the unencrypted keyfile from luks.
So the questions are
what is the best way to hook into system.autoUpgrade ?
any guides for making luks attempt to use that keyfile?
guides for running a script as root at the start of stage 2?
Commands
### luks part ###
# create master keyfile
sudo dd if=/dev/urandom of=/root/luks_keyfile bs=1024 count=4
sudo chmod 600 /root/luks_keyfile
sudo cryptsetup luksAddKey /dev/sdX /root/luks_keyfile
sudo cryptsetup open --test-passphrase --key-file /root/luks_keyfile /dev/sdX
### updater service ###
# create a new keyfile (above)
# add another keyfile non-interactively
sudo cryptsetup luksAddKey /dev/sdX /boot/keyfile --key-file /root/luks_keyfile
I dont know but assume I dont. Good guess though, that would indeed be good.
I have a nitrokey though, that can store pgp keys and solve fido2/webauth challenges. Could that work with zero user input? And without the key being possible to use in the future to open the device?
Tbh I dont see the current implementation as an issue as the keyfile is only valid for a few seconds. Having FDE is nice but my threat model is not that high. A normal poweroff would not create that keyfile, only a system update
Some keys like that require physical verification from the user when unlocking by touching the key or pressing a button and/or providing a pin. This is to prevent unattended use by design. The NitroKey 3 and NitroKey Passkey have the touch button.
What is your threat model, presumably physical access is assumed to not be a threat?
The threat is having a machine that can be taken away and analyzed by sophisticated people with limited time and expertise that will not wait a day or so.
Also to have an outdated systems because no automatic updates.
For use with HEADS (pgp) the key always required a pin, making it useless for this. Also the threat is the machine and key and everything to be taken away, so if it doesnt require user interaction, of course it is useless. It needs to be a one-time key that can be removed without the need of shredding (which doesnt work on SSDs anyways)
The threat is having a machine that can be taken away and analyzed by sophisticated people with limited time and expertise that will not wait a day or so.
In that case the key should reside outside the machine and impossible to access in case the machine is removed. This also rules out putting the key in a TPM or a USB dongle for unattended use.
Is it possible to fully avoid reboots forever using kexec? I assume that is risky and unstable
But the key is only created after an update, which finishes with a reboot, and after the reboot the key is deleted. If the boot was not successful but the boot script works, it is not valid so not a problem either
It’s probably not possible to avoid reboots forever, as you will at some point want to apply firmware/platform updates to the machine. Kexec does make it possible to do a “soft” reboot without going through bios/firmware.
I would highly recommend using a TPM if possible. most systems manufactured in the last 8 years or so will have either a firmware TPM that uses the CPU’s secure element for keys/encryption, or at least headers on the motherboard to install a TPM module which can be obtained cheaply on ebay. You can then use lanzaboote or just plain systemd-boot to measure and boot the system, unlocking LUKS only if the hardware attests that it is booting the correct kernel/initrd.
Finally. there’s wamserma’s suggestion of storing keys remotely. There’s clevis/tang which also leverages the TPM to request remote keys, allowing unattended reboots only if the machine is still connected to your network.
True I have heard of such an approach and it is really cool that clevis and tang are available on nixos by default.
But it would require 2 machines and some way of not allowing unlock once the machines are not in a local network.
I can well imagine to have a homeserver and another machine running 24/7.
But this still doesnt really address the security benefits? The threat is a scenario where devices are taken away. The scenario is EXTREMELY unlikely, and even more unlikely is that it happens during the 20 seconds that the key is stored unencrypted, valid and the machine has not wiped it again.
It makes sense for bigger deployments for sure, with an off-site server that you can shut down if a breach should happen. Amazing system. But I dont think it makes sense here?
On my encrypted machines I have a keyfile sitting at /luks.key with chmod 0000. Before kexec.target I append a cpio image with they keyfile to the original initrd in memory and load that instead of the default. If I remember correctly this only works with systemd-boot, I honestly forgot why, it’s been a while. Next you need to add the keyfile (/boot/luks.key in my case) path to your boot.initrd.luks.devices.<name>.keyFile config. Remember to enable fallbackToPassword for normal boots.
#!/usr/bin/env bash
set -euxo pipefail
umask 0077
TMPDIR=$(mktemp --directory --tmpdir=/dev/shm initrd.XXXXXXXXX)
function cleanup() {
shred -fu "${TMPDIR}/initrd" || true
shred -fu "${TMPDIR}/boot/luks.key" || true
rm -rf "$TMPDIR"
}
trap cleanup INT TERM EXIT
cd "$TMPDIR"
# put target files in place
mkdir -v ./boot
cp -v /luks.key ./boot/
# pack and append the initrd
PROFILE="$(readlink -f /nix/var/nix/profiles/system)"
cp -v "${PROFILE}/initrd" initrd
find . -print0 | cpio --null --create --format=newc --owner=+0:+0 | zstd >> initrd
# append initrd secrets
"$PROFILE/append-initrd-secrets" initrd
exec kexec --load "${PROFILE}/kernel" --initrd="${TMPDIR}/initrd" --append="$(cat "${PROFILE}/kernel-params") init=${PROFILE}/init"
Anyway, this fits my threat model for these machines given a certain update interval and the resulting maintenance headache. I tend to gitops my private infra these days. Push to git, CI builds, machines pull and switch/kexec/reboot as needed/possible.
It’s because systemctl kexec is actually a boot loader implementation (and I’m only kind of kidding). It looks at your ESP / XBOOTLDR / EFI vars and figures out what the next reboot would do, then pulls kernels / initrds / cmdlines from there to replicate that, and does its own kexec --load .... So if you’re not doing “boot loader specification” (and NixOS only does it for systemd-boot), then systemctl kexec will fail immediately and not even try to switch to kexec.target.
So in reality what’s happening is roughly:
systemctl kexec checks if you’re doing boot loader spec
Uses it to emulate a reboot with kexec
Switches to kexec.target
And then your service runs and says “hey that stuff you did with boot loader spec? nvm, I got this”
And there isn’t a way to make systemctl kexec aware that you’ve got a different service to implement the loading, so it just always fails if you’re not doing boot loader spec (ok, this isn’t completely fair, because it does just work no matter what if you already did kexec --load ... before you invoke systemctl kexec, but that inherently means the loading is happening as a part of the decision to systemctl kexec, rather than as part of the kexec.target transition)