Can't mount ZFS snapshot in systemd unit

Hey folks, I’m attempting to back up my ZFS-based system using Restic and am running into issues. Long story shorter, I initially tried making a snapshot and backing up that directory directly:

zfs snapshot zpool/home@restic
# Do things with /home/.zfs/snapshot/restic

Then I ran into this issue where the suggested fix is to mount the ZFS snapshot directly. So I tried making this Restic configuration:

{ config, pkgs, ... }:

{
  services.restic.backups.system = {
    paths = [
      "/mnt/restic"
    ];
    repository = "s3:s3.us-west-001.backblazeb2.com/something/${config.networking.hostName}";
    environmentFile = config.age.secrets."restic_b2_${config.networking.hostName}".path;
    passwordFile = config.age.secrets."restic_password_${config.networking.hostName}".path;
    initialize = true;
    backupPrepareCommand = ''
      ${pkgs.zfs}/bin/zfs snapshot zpool/home@restic
      ${pkgs.zfs}/bin/zfs snapshot zpool/var@restic
      ${pkgs.util-linux}/bin/mount -t zfs zpool/home@restic /mnt/restic/home
      ${pkgs.util-linux}/bin/mount -t zfs zpool/var@restic /mnt/restic/var
    '';
    backupCleanupCommand = ''
      ${pkgs.util-linux}/bin/umount /mnt/restic/home
      ${pkgs.util-linux}/bin/umount /mnt/restic/var
      ${pkgs.zfs}/bin/zfs destroy zpool/home@restic
      ${pkgs.zfs}/bin/zfs destroy zpool/var@restic
    '';
    timerConfig.OnCalendar = "hourly";
    pruneOpts = [
      "--keep-daily 7"
      "--keep-weekly 5"
      "--keep-monthly 12"
      "--keep-yearly 2"
    ];
  };

  systemd.tmpfiles.rules = [
    "d /mnt/restic 0700 root root"
    "d /mnt/restic/home 0700 root root"
    "d /mnt/restic/var 0700 root root"
  ];

  age.secrets."restic_b2_${config.networking.hostName}".file =
    ../secrets/restic_b2_${config.networking.hostName}.age;
  age.secrets."restic_password_${config.networking.hostName}".file =
    ../secrets/restic_password_${config.networking.hostName}.age;
}

Unfortunately that always creates 0-size backups and runs without error. The backupCleanupCommand does appear to fail to unmount my directories, though that doesn’t seem to fail the run.
I followed the generated systemd unit back to the backupPrepareCommand script. It looks fine and runs (I.e. creates the snapshots, mounts the directories, they have the right contents.) If I comment the backupCleanupCommand script, the snapshots appear to be created but not mounted (I.e. the snapshots exist but the directory is empty.)
It’s almost like the systemd unit is prohibited from mounting snapshots. Have I hit a hardening restriction or is there something more subtle going on here?
Thanks for any help.

Have you tried running this command manually? Do your snapshots allow legacy mounting?

Yeah, see above where I ran the script manually. I’ve also ran these individual commands manually many times. Really does seem to be something about running in the unit but I don’t know what.

Still struggling with this. Here are some things I’ve tried:

I changed my prepare/cleanup commands to:

    backupPrepareCommand = ''
      #!${pkgs.bash}/bin/sh
      set -x
      ${pkgs.zfs}/bin/zfs snapshot zpool/home@restic
      ${pkgs.zfs}/bin/zfs snapshot zpool/var@restic
      ${pkgs.util-linux}/bin/mount -t zfs zpool/home@restic /mnt/restic/home
      ${pkgs.util-linux}/bin/mount -t zfs zpool/var@restic /mnt/restic/var
    '';
    backupCleanupCommand = ''
      #!${pkgs.bash}/bin/sh
      set -x
      ${pkgs.util-linux}/bin/umount /mnt/restic/home
      ${pkgs.util-linux}/bin/umount /mnt/restic/var
      ${pkgs.zfs}/bin/zfs destroy zpool/home@restic
      ${pkgs.zfs}/bin/zfs destroy zpool/var@restic
    '';

I thought maybe making these commands standalone shell scripts might help. It didn’t. I also wondered if the commands actually being run might differ from what’s in the script somehow–basically throwing stuff at the wall to see what sticks at this point.

The commands look fine. They run fine if I copy-paste them out of the logs. They seem to silently fail in the unit, though, and I still don’t know why.

Thanks.

I’m doing something similar, but more involved: I’m spawning a backup job (bash script) from a systemd timer. This script then finds the appropriate ZFS snapshot and spawns a systemd-nspawn container into which the snapshot is mounted as if it had the path of the original.

I hacked this together to work around the lack of CLI options to overwrite the path of a snapshot. It’s certainly not simple, but it held up until now.

If you’re interested in that, I can gather and post my scripts once I have the time. It certainly won’t be a simple drop-in nix expression, though.

Actually, as an addendum, since this is less work for me right now: If you’re simply looking to debug why your commands fail inside of the systemd unit but work in a shell, I might have a snippet for you.

I ran basically into the same issue with this nginx pam issue, and I’ve taken a stab at generalizing the debugging script I threw together back then. You can find it here:

If you run this script like this:

$ ./emulate-systemd-unit.sh print <myunit> echo "hello world"

…it will print a huge systemd-run command that runs your command (everything after <myunit>) using systemd with the environment settings of your unit. You should be able to copy+paste it into a shell and have it run like that.

Or, if you trust my script enough, you can run it like this:

# ./emulate-systemd-unit.sh run <myunit> echo "hello world"

…to directly run your command and print the resulting output, systemd stats and logs.

Edit: Obviously, this is mostly useful to experiment. Run the command, watch it fail. Then have a look at all the settings and try out some things like adding capabilities or the like. For the nginx problem, the issue were the capabilities, for example.

Sorry for the delayed response–got bogged down with life and had to put this down for a bit.

Thanks for this! I had to tweak the output of your command a bit because my systemd unit uses EnvironmentFiles but the property expected is singular. If I use print and modify the output accordingly, I get this command:

$ systemd-run --pty -u 'test-restic-backups-system' --collect --wait \
--property=Environment=LOCALE_ARCHIVE=/nix/store/fiinrcd99rnhgq9jws1pc9dk3dwzgmfd-glibc-locales-2.40-66/lib/locale/locale-archive\ PATH=/nix/store/lij02c29n6bwxma5wv7hxkxph20kil3l-openssh-9.9p2/bin:/nix/store/9m68vvhnsq5cpkskphgw84ikl9m6wjwp-coreutils-9.5/bin:/nix/store/vc2d1bfy1a5y1195nq7k6p0zcm6q89nx-findutils-4.10.0/bin:/nix/store/qjsj5vnbfpbg6r7jhd7znfgmcy0arn8n-gnugrep-3.11/bin:/nix/store/3ks7b6p43dpvnlnxgvlcy2jaf1np37p2-gnused-4.9/bin:/nix/store/21z9i4yi42z608308jng11x3lyrslymy-systemd-256.10/bin:/nix/store/lij02c29n6bwxma5wv7hxkxph20kil3l-openssh-9.9p2/sbin:/nix/store/9m68vvhnsq5cpkskphgw84ikl9m6wjwp-coreutils-9.5/sbin:/nix/store/vc2d1bfy1a5y1195nq7k6p0zcm6q89nx-findutils-4.10.0/sbin:/nix/store/qjsj5vnbfpbg6r7jhd7znfgmcy0arn8n-gnugrep-3.11/sbin:/nix/store/3ks7b6p43dpvnlnxgvlcy2jaf1np37p2-gnused-4.9/sbin:/nix/store/21z9i4yi42z608308jng11x3lyrslymy-systemd-256.10/sbin\ RESTIC_CACHE_DIR=/var/cache/restic-backups-system\ RESTIC_PASSWORD_FILE=/run/agenix/restic_password_flynode\ RESTIC_REPOSITORY=s3:s3.us-west-001.backblazeb2.com/nolans-nixos-backups/flynode\ TZDIR=/nix/store/lci8iybamsi7zaqywpz4sc0qx1xw85jx-tzdata-2025b/share/zoneinfo \
--property=EnvironmentFile=/run/agenix/restic_b2_flynode\ \(ignore_errors=no\) \
--property=CapabilityBoundingSet=cap_chown\ cap_dac_override\ cap_dac_read_search\ cap_fowner\ cap_fsetid\ cap_kill\ cap_setgid\ cap_setuid\ cap_setpcap\ cap_linux_immutable\ cap_net_bind_service\ cap_net_broadcast\ cap_net_admin\ cap_net_raw\ cap_ipc_lock\ cap_ipc_owner\ cap_sys_module\ cap_sys_rawio\ cap_sys_chroot\ cap_sys_ptrace\ cap_sys_pacct\ cap_sys_admin\ cap_sys_boot\ cap_sys_nice\ cap_sys_resource\ cap_sys_time\ cap_sys_tty_config\ cap_mknod\ cap_lease\ cap_audit_write\ cap_audit_control\ cap_setfcap\ cap_mac_override\ cap_mac_admin\ cap_syslog\ cap_wake_alarm\ cap_block_suspend\ cap_audit_read\ cap_perfmon\ cap_bpf\ cap_checkpoint_restore \
--property=User=root \
--property=PrivateTmp=yes \
--property=PrivateDevices=no \
--property=ProtectClock=no \
--property=ProtectKernelTunables=no \
--property=ProtectKernelModules=no \
--property=ProtectKernelLogs=no \
--property=ProtectControlGroups=no \
--property=PrivateNetwork=no \
--property=PrivateUsers=no \
--property=PrivateMounts=no \
--property=PrivateIPC=no \
--property=ProtectHome=no \
--property=ProtectSystem=no \
--property=NoNewPrivileges=no \
--property=LockPersonality=no \
--property=MemoryDenyWriteExecute=no \
--property=RestrictRealtime=no \
--property=RestrictSUIDSGID=no \
--property=RestrictNamespaces=no \
--property=ProtectProc=default \
--property=ProtectHostname=no \
-- /nix/store/zxp6h02vn93kncb6zgv5gh5337n47zzx-unit-script-restic-backups-system-pre-start/bin/restic-backups-system-pre-start
Running as unit: test-restic-backups-system.service; invocation ID: d7a86b4ddab7405b8eb75ac163d85a16
Press ^] three times within 1s to disconnect TTY.
Finished with result: resources
CPU time consumed: 0
Memory peak: 0B (swap: 0B)

Any idea what to do with Finished with result: resources? From what I’ve searched, it seems to be a fairly generic error that should log an actual failure somewhere, but it isn’t in the journalctl output for restic-backups-system, nor do I see anything obvious in journalctl -f or dmesg.

Is cap_sys_admin the only capability needed to mount filesystems? Seems it’s there but that doesn’t seem to be the issue either.

Thanks again.

Sounds like systemd sandboxing might be blocking the mounts. Try adding SystemCallFilter=~mount and CapabilityBoundingSet=CAP_SYS_ADMIN to the unit override or use systemd.services.<name>.serviceConfig to tweak it in Nix. You could also check if ProtectSystem, ProtectHome, or PrivateMounts are interfering.

Ah, it looks like systemctl show sadly doesn’t print EnvironmentFile= properties in a format that’s useful for deriving a systemd-run command from. Sorry for the oversight – I’ve never looked at a service with EnvironmentFile settings yet.

The specific problem in your posted command is the \ \(ignore_errors=no\) at the end of the line. In a unit file, that’s not part of the syntax and will be interpreted as part of the file path. The resource failure you’re seeing consequently is because systemd is looking for an environment file that does not exist. I’ve updated my Gist if you want to keep experimenting.

As for CAP_SYS_ADMIN: My understanding is that it equates to “you are root” for most purposes, so I wouldn’t expect you to still need other capabilities. The capability man page also lists mounting things explicitly as an included privilege: capabilities(7) - Linux manual page

I ended up going with a different solution–essentially just skipping the complicated mount step, which I only went with because I’d hit this and mounting the pools was suggested as a workaround. It manifested weirdly because it only happened in one pool–/var on my server, but not /var on my laptop or /home anywhere else. I rebuilt the server and I guess it pulled in a kernel/zfsutils upgrade that fixed it, so now I’m able to snapshot the pools and back up the snapshots directly.

Thanks for all the help.

1 Like