@aszlig, indeed, however those directories are either empty (/etc
, /root
, /usr
, /var
) or as they should be wrt. the configured hardening (/dev
, /proc
, /sys
, /nix
):
$ systemd-run -P -pPrivateMounts=1 -pRuntimeDirectory=systemd-confinement/test -pRootDirectory=/run/systemd-confinement/test -pUMask=066 -pBindReadOnlyPaths=/nix/store -- $(readlink /run/current-system/sw/bin/ls) -l /
Running as unit: run-u16023.service
total 0
drwxr-xr-x 20 0 0 4160 May 3 13:24 dev
drwxr-xr-x 2 0 0 40 May 4 16:35 etc
drwxr-xr-x 3 0 0 60 May 4 16:35 nix
dr-xr-xr-x 376 0 0 0 May 4 16:35 proc
drwxr-xr-x 2 0 0 40 May 4 16:35 root
drwxrwxrwt 4 0 0 80 May 4 16:35 run
dr-xr-xr-x 13 0 0 0 Apr 28 17:44 sys
drwxr-xr-x 2 0 0 40 May 4 16:35 usr
drwxr-xr-x 2 0 0 40 May 4 16:35 var
AFAIU that’s because systemd
’s setup_namespace()
calls base_filesystem_create()
which creates those usual top level directories.
This would explain why, as you noticed in the original PR:
Another quirk we do have right now is that systemd tries to create a /usr
directory within the chroot, which subsequently fails. Fortunately, this is just an ugly error and not a hard failure.
A way to limit access to those mountpoints/directories:
$ systemd-run -P -pPrivateMounts=1 -pRuntimeDirectory=systemd-confinement/test -pRootDirectory=/run/systemd-confinement/test -pUMask=066 -pBindReadOnlyPaths=/nix/store -- $(readlink /run/current-system/sw/bin/findmnt)
Running as unit: run-u16084.service
TARGET SOURCE FSTYPE OPTIONS
/ tmpfs[/systemd-confinement/test] tmpfs rw,nosuid,nodev,size=1988236k,mode=755
|-/dev devtmpfs devtmpfs rw,nosuid,size=397648k,nr_inodes=991015,mode=755
| |-/dev/pts devpts devpts rw,nosuid,noexec,relatime,gid=3,mode=620,ptmxmode=666
| |-/dev/shm tmpfs tmpfs rw,nosuid,nodev
| |-/dev/hugepages hugetlbfs hugetlbfs rw,relatime,pagesize=2M
| `-/dev/mqueue mqueue mqueue rw,nosuid,nodev,noexec,relatime
|-/nix/store losurdo/nix[/store] zfs ro,relatime,xattr,posixacl
|-/proc proc proc rw,nosuid,nodev,noexec,relatime
|-/run tmpfs tmpfs rw,relatime
| |-/run/systemd/incoming tmpfs[/systemd/propagate/run-u16084.service] tmpfs ro,nosuid,nodev,size=1988236k,mode=755
| `-/run/systemd-confinement/test tmpfs[/systemd-confinement/test] tmpfs rw,nosuid,nodev,size=1988236k,mode=755
| |-/run/systemd-confinement/test/dev devtmpfs devtmpfs rw,nosuid,size=397648k,nr_inodes=991015,mode=755
| | |-/run/systemd-confinement/test/dev/pts devpts devpts rw,nosuid,noexec,relatime,gid=3,mode=620,ptmxmode=666
| | |-/run/systemd-confinement/test/dev/shm tmpfs tmpfs rw,nosuid,nodev
| | |-/run/systemd-confinement/test/dev/hugepages hugetlbfs hugetlbfs rw,relatime,pagesize=2M
| | `-/run/systemd-confinement/test/dev/mqueue mqueue mqueue rw,nosuid,nodev,noexec,relatime
| |-/run/systemd-confinement/test/nix/store losurdo/nix[/store] zfs ro,relatime,xattr,posixacl
| |-/run/systemd-confinement/test/proc proc proc rw,nosuid,nodev,noexec,relatime
| `-/run/systemd-confinement/test/run tmpfs tmpfs rw,relatime
| `-/run/systemd-confinement/test/run/systemd/incoming tmpfs[/systemd/propagate/run-u16084.service] tmpfs rw,nosuid,nodev,size=1988236k,mode=755
`-/sys sysfs sysfs rw,nosuid,nodev,noexec,relatime
|-/sys/kernel/security securityfs securityfs rw,nosuid,nodev,noexec,relatime
|-/sys/fs/cgroup cgroup2 cgroup2 rw,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot
|-/sys/firmware/efi/efivars efivarfs efivarfs rw,nosuid,nodev,noexec,relatime
|-/sys/fs/bpf bpf bpf rw,nosuid,nodev,noexec,relatime,mode=700
|-/sys/fs/fuse/connections fusectl fusectl rw,nosuid,nodev,noexec,relatime
|-/sys/fs/pstore pstore pstore rw,nosuid,nodev,noexec,relatime
`-/sys/kernel/config configfs configfs rw,nosuid,nodev,noexec,relatime
Is to use InaccessiblePaths=
:
$ systemd-run -P -pPrivateMounts=1 -pRuntimeDirectory=systemd-confinement/test -pRootDirectory=/run/systemd-confinement/test -pUMask=066 -pBindReadOnlyPaths=/nix/store -pInaccessiblePaths=-+/run/systemd-confinement/test -pInaccessiblePaths=-+/dev -pInaccessiblePaths=-+/sys -- $(readlink /run/current-system/sw/bin/findmnt)
Running as unit: run-u16079.service
TARGET SOURCE FSTYPE OPTIONS
/ tmpfs[/systemd-confinement/test] tmpfs rw,nosuid,nodev,size=1988236k,mode=755
|-/dev tmpfs[/systemd/inaccessible/dir] tmpfs ro,nosuid,nodev,noexec,size=1988236k,mode=755
|-/nix/store losurdo/nix[/store] zfs ro,relatime,xattr,posixacl
|-/proc proc proc rw,nosuid,nodev,noexec,relatime
|-/run tmpfs tmpfs rw,relatime
| `-/run/systemd/incoming tmpfs[/systemd/propagate/run-u16079.service] tmpfs ro,nosuid,nodev,size=1988236k,mode=755
`-/sys tmpfs[/systemd/inaccessible/dir] tmpfs ro,nosuid,nodev,noexec,size=1988236k,mode=755
That’s why I think we should drop TemporaryFileSystem=/
in favor of a RootDirectory=
inside a RuntimeDirectory=
. What do you think?