Decrypting ZFS pools over SSH on 26.05

Hi!

I am having issues with setting up remote unlocking of multiple ZFS pools after migrating to 26.05. I have two encrypted pools, rpool and tank (root and storage respectively). I was following this guide until the migration, which worked perfectly. After moving to 26.05, I tried using this other guide, which only works partially. The initrd prompts for the passphrase for rpool, after which it exits and continues booting. The tank passphrase is only prompted for later in the boot sequence, after the SSH connection is no longer available. I have tried manually importing the tank pool from initrd and loading its key, which works but causes the later load-key machinery to think it failed for tank.

The config for the host I am talking about is here. Relevant parts are configuration.nix and hardware.nix. Any help would be most appreciated. (I am planning to update the wiki once this is solved…)

Best regards,

Kacper

    initrd = {
      availableKernelModules = [ "r8169" ];
      network = {
        enable = true;
        postCommands = ''
          # Import my own pools.
          zpool import -a
          # Add the load-key command to the .profile
          echo "zfs load-key rpool; zfs load-key tank; zfs load-key tank/backup/kamil; killall zfs" >> /root/.profile
        '';
        ssh = {
          enable = true;
          port = 2222;
          # `ssh-keygen -t ed25519 -N "" -f /path/to/ssh_host_ed25519_key`
          hostKeys = [ /etc/ssh/ssh_host_ed25519_key_boot ];
          authorizedKeys = config.users.users.kacper.openssh.authorizedKeys.keys;
        };
      };
    };

So this is how you were doing it before. But I think there are some problems: importing pools that aren’t needed until stage 2 in stage 1 makes the boot process a little unnecessarily fragile, the killall zfs thing has always been a workaround for the lack of a proper ask-password infrastructure, and putting the unlock commands in .profile was similarly a bad way to make it so remote login can only do password entry. So, in order:

  1. Given that tank isn’t actually needed until stage 2, I would recommend not trying to decrypt it like this at all. I would recommend just storing a key file on the encrypted root fs and then setting the dataset’s keylocation to file:///path/to/keyfile. That way it just works, with no interaction or stage 1 customization required. If you do still want it to be decrypted via manual passphrase entry during stage 1 SSH for some reason, you’ll need to make NixOS think it needs to import the pool during stage 1. I don’t think we have an option to just tell it that a pool should be imported in stage 1 instead of stage 2, but you do have a "/mnt/tank/kacper" file system, for which you can just add fileSystems."/mnt/tank/kacper".neededForBoot = true; to cause it to be mounted in stage 1 and make NixOS realize it needs to import that pool in stage 1. But again, I’d recommend just having the other pool decrypted by a key file on the root pool.

  2. As you saw in the blog you linked, the killall zfs thing should be replaced by just actually responding to the passphrase request from the existing zfs import service; that would be better than trying to do its job for it with your own zfs load-key. So when the user logs in, they should use systemctl default or systemd-tty-ask-password-agent --watch to do this. I’ll also note, the blog says this:

    After loading keys manually with zfs load-key -a, running systemctl default to advance the boot would trigger systemd’s own zfs-load-key services, which send password prompts to the physical console through the systemd-ask-password mechanism — not to your SSH session. So you’d see password prompts but entering anything would have no effect.

    […]

    Running systemd-tty-ask-password-agent on your SSH terminal intercepts those queued requests and routes the prompts — and your answers — back through systemd properly. No manual key loading, no systemctl default needed. Systemd continues the boot naturally once all keys are provided.

    But I think this misunderstands why systemctl default is a nice implementation. When you run systemctl default it also runs systemd-tty-ask-password-agent --watch. So they shouldn’t have been trying to do their own zfs load-key followed by systemctl default; they should have just been doing systemctl default and responding to the passphrase request that it would have prompted. The nice thing about systemctl default is that it waits until the boot process reaches initrd.target and tells you if it failed; so you get one command that A) prompts for passphrases, B) exits with failure and an error message if boot failed.

  3. The blog post goes on to do this in order to get the .profile behavior back:

      # Write a .profile to /var/empty (root's home in the systemd initrd)
      # so that logging in over SSH automatically starts the password agent.
      boot.initrd.systemd.services.zfs-setup-root-profile = {
        description = "Prepare root .profile for ZFS unlocking via SSH";
        wantedBy = [ "initrd.target" ];
        before = [ "initrd-root-fs.target" ];
        unitConfig.DefaultDependencies = false;
        script = ''
          mkdir -p /var/empty
          echo "systemd-tty-ask-password-agent --watch" > /var/empty/.profile
        '';
        serviceConfig.Type = "oneshot";
      };
    

    But I don’t think using .profile like this was ever a good idea even with scripted stage 1. Putting this in .profile means that even your rescue shell is going to start this command, which is obviously not the intended behavior. The point is just to make it so that remote login only lets you enter a passphrase, and you can do that in the command= option to your SSH key:

    boot.initrd.network.ssh.authorizedKeys = [ "command=\"systemctl default\" ${key}" ];
    

    That has exactly the intended effect without breaking the entire user shell for rescue / debug shells.


Finally, I’ll note, just because I saw this in your config:

"/home" = {
  device = "rpool/home";
  fsType = "zfs";
  options = [ "zfsutil" ];
};

This makes me worried that you’re mixing non-legacy ZFS mountpoints with NixOS fileSystems. Generally, a dataset should either have a fileSystems entry without the zfsutil mount option if it has the mountpoint=legacy ZFS property, or have no fileSystems entry and a non-legacy mountpoint=/foo ZFS property. i.e. A non-legacy mountpoint dataset should generally not be managed with fileSystems. The one exception to this is the file systems mounted in stage 1, namely /, /var, and /nix; if these are non-legacy mountpoints, then they do still need to have fileSystems entries and those do need the zfsutil mount option. But I tend to just recommend using legacy mountpoints for anything you want to have in fileSystems so you don’t really have to think about it.

The consequence of getting this wrong can be that zfs-mount.service races with the systemd mount units created by systemd-fstab-generator, and sometimes one can cause the other to error and fail your boot.

3 Likes

Agree with everything in the above. One addition:

if you do this, make sure you have a safe backup of the keyfile, because if your rpool goes away, you can’t just use a passphrase to access data on the tank pool

2 Likes

Thanks for your help! I tested fileSystems.<name>.neededForBoot = true;together with command in authorizedKeys, and it all works perfectly.

The main reason I am doing this is because I do not want to store keyfiles, but that would be simpler in many ways, as you say.

As for my mounting with zfsutil, I have been doing so for a long while without issue. I just followed this guide. Note that I am not using mountpoint=/some/path, so I don’t think races between zfs and systemd should be an issue. Thanks for your concern! :slight_smile:

1 Like

I don’t think that guide is very clear about this, but it does say:

Disable the mount service with systemd.services.zfs-mount.enable = false; or remove the fileSystems entries in hardware-configuration.nix. Otherwise, use legacy mountpoints (created with e.g. zfs create -o mountpoint=legacy). Mountpoints must be specified with fileSystems."/mount/point" = {}; or with nixos-generate-config.

as well as:

    # the zfsutil option is needed when mounting zfs datasets without "legacy" mountpoints
+    options = [ "zfsutil" ];

These things are correct, but I don’t think they’re very well explained. If you use non-legacy mountpoints, you must use zfsutil, and if you use legacy mountpoints, you must not use zfsutil; ZFS will fail to mount otherwise, so you are presumably using non-legacy mounts. But you do really need to heed the warning about either using legacy mountpoints, or disabling zfs-mount.service, or removing the datasets from fileSystems, because having non-legacy mounpoints, zfs-mount.service, and fileSystems all at the same time carries a risk of failing to boot as the systemd mount and zfs-mount.service race. It seems reasonably unlikely to actually cause a boot failure, but it can happen, and following the suggestion to do one of those things will rule it out.

1 Like

Right… That’s reasonable. I havemountpoint=none, except whenever I want to share something via NFS. Then I delegate this to the sharenfs property and mount using zfs-mount.service. Is the fileSystems vs zfs-mount.service exclusivity system-wide or per filesystem?

… TIL mountpoint=none can actually be mounted with mount -t zfs -o zfsutil. I did not know that. I thought you simply could not mount mountpoint=none file systems, and that you had to use mountpoint=legacy for fileSystems to work. Huh, neat. And yea, in that case it does seem like you actually need zfsutil.

So to modify what I’ve said before, most of what I said is the same between mountpoint=legacy and mountpoint=none, except that mountpoint=none does actually need the zfsutil option.

1 Like

I think there are a bunch of ways how to trigger the unlock. I don’t like .profile or the command= in authorized_keys method since it locks me out of using other commands.

Some other options would be to have ssh config specific for unlocking:

Host my-host-unlock
    Hostname my-host
    Port 2222
    User root
    RemoteCommand systemd-tty-ask-password-agent --watch # or `systemctl default` but for me that yielded `'xterm-ghostty': unknown terminal type.`
    RequestTTY yes
    ConnectTimeout 60 # optional

You can still ssh without the RemoteCommand using ssh my-host-unlock -o RemoteCommand=none. Since I use a justfile for building, switching, etc. I also have a just unlock command to print my password and wait for the connection.

# SSH to just started host to unlock zfs with echoed zfs key
unlock host=NIXOS_HOST:
    sops decrypt ../nixos-secrets/{{host}}.secrets.yaml --extract '["zfs-key"]'
    @echo
    while ! ssh root@{{host}} -p 2222 -t sh -c 'systemd-tty-ask-password-agent --watch'; do sleep 1; done
    @echo "Booting..."

On another note: I don’t use zfsutil anywhere, not for my zroot and not for the tanks. Maybe because I have boot.initrd.supportedFilesystems=["zfs"] and boot.supportedFilesystems=[“zfs”]?

The tanks get mounted via keylocation=file:///sops-file and mountpoint=legacy and fileSystems since I had some trouble using the regular zfs way. I did not want to risk not booting should one of the datasets be faulty and the mount be attempted too early.
Since I use basically one dataset per service I made a wrapper module to also make systemd units depend on it so they don’t start unless the dataset is present. No idea, if there is a better way to do that…

zfs-service-deps.nix
{
  lib,
  config,
  pkgs,
  ...
}:
with lib; let
  cfg = config.zfs-service-deps;

  escapeSystemdMount = dataset: "${
    lib.removeSuffix "\n" (builtins.readFile (pkgs.runCommand
      "escape-${builtins.replaceStrings ["/"] ["-"] dataset}" {}
      "${pkgs.systemd}/bin/systemd-escape --path ${dataset} > $out"))
  }.mount";

  # Normalize to always work with lists
  toList = v:
    if builtins.isList v
    then v
    else [v];
in {
  options.zfs-service-deps = mkOption {
    type = types.attrsOf (types.either types.str (types.listOf types.str));
    default = {};
    description = "Map of service names to ZFS datasets (or list of datasets) they depend on";
    example = {
      navidrome = "tank2/navidrome";
      smbd = ["tank1/media" "tank2/backup"];
    };
  };

  config = {
    # Create fileSystems entries for all datasets
    fileSystems = mkMerge (flatten (mapAttrsToList (service: datasets:
      map (dataset: {
        "/${dataset}" = {
          device = dataset;
          fsType = "zfs";
          options = ["nofail"];
        };
      }) (toList datasets))
    cfg));

    # Create systemd dependencies
    systemd.services =
      mapAttrs (service: datasets: {
        after = map escapeSystemdMount (toList datasets);
        requires = map escapeSystemdMount (toList datasets);
      })
      (lib.filterAttrs (_: v: v != null) cfg);
  };
}

This text will be hidden

There’s three kinds of mountpoint properties on a ZFS dataset. mountpoint=none means it can be mounted with nixos’s fileSystems options but needs the zfsutil option. mountpoint=legacy means it can be mounted with nixos’s fileSystems options but needs to not have the zfsutil option. mountpoint=/absolute/path means it should not be mounted with nixos’s fileSystems options because that conflicts with ZFS doing it automatically, but if you do anyway (e.g. for datasets that need to be mounted during stage 1) it would need the zfsutil option.

FWIW if you add utils to your module arguments, you can replace escapeSystemMount with utils.escapeSystemdPath, which does the exact same thing but without IFD.

This is equivalent to unitConfig.RequiresMountsFor = toList datasets;. I find RequiresMountsFor quite nice because A) it doesn’t require escaping the path, and B) it lets you just specify the path you need instead of an exact mount unit; e.g. you can just say you need /var/log/foo and it still works even if that’s not a mount point but /var/log is.

2 Likes

Thank you, and even better: That also make utils.escapeSystemdPath unnecessary!

Since datasets are not mountpoints directly (tank/dataset). It should be:
unitConfig.RequiresMountFor = map (dataset: "/${dataset}") (toList datasets);

oh, yea, good catch :slight_smile:

Would it be possible to add an option to have a “fully manual” remote unlock? I have a somewhat complicated ZFS setup involving dozens of datasets and several different passphrases. I have a custom script I wrote that remotely handles the unlock process.

I just need zpool import, zfs load-key, and some way to indicate to the boot process to proceed with the boot.

Previously, I did that with “killall zfs”, but that doesn’t seem to work any more.

Thank you!

You can set boot.zfs.requestEncryptionCredentials to the empty list so that it doesn’t prompt for any passwords itself and then you can add your own service like:

boot.initrd.systemd.services.my-service = {
  requiredBy = [ "initrd.target" ];
  before = [ "sysroot.mount" ];
  after = [ "zfs-import-${poolName}.service" ];
  unitConfig.DefaultDependencies = false;
  serviceConfig.Type = "oneshot";
  ...
};

And have that service just do whatever logic it is you want to do. Hard to say specifically how to do what you want without knowing specifically what you want to do :stuck_out_tongue:

1 Like

Thank you! This looks much better than my current approach, which is to ssh in, then create and run a temporary script. I’ll try porting my script into a initrd systemd service as you suggest.

My boot process is to: (1) zpool import the pools listed in a file; (2) zfs load-key -a using the passphrases in a directory. I use ssh to remotely populate these values.

pools-to-mount: file containing "pool1 pool2 ..."
(I may be able to replace this with extraPools=...)

passphrases.d/
   foo (passphrase is the contents of the file, not the filename)
   blah
   ...

Once “zfs get keystatus” reports “available” for all datasets, proceed with the boot.

Hm, in general I would just recommend having one key for all those datasets. You could either have them all use the same encryptionroot (though that’s not possible across multiple pools), or you could just have the remote user upload one key file that can be used for all of them. Or if they really do need to be different keys, put those keys in one encrypted place that the remote user provides one passphrase for.