Restricting /nix/store in a mount namespace

rhendric · August 23, 2022, 7:28pm

I’m working on a personal project that involves jailing applications using tools like bubblewrap. Bubblewrap, among other capabilities, invokes an executable in an empty-by-default mount namespace and allows the user to specify paths to be bind-mounted in. Any other files are inaccessible by the executable.

Naturally, I can bind the entire host /nix/store to /nix/store in the namespace. But suppose I want to limit the applications that can be run inside the jail. What would be really slick would be a way to filter the /nix/store inside the jail down to just the closure of the application. Bind-mounting each individual directory in the closure would quickly exhaust the maximum number of bind mounts.

So I’m looking for, or considering writing, something like a FUSE filesystem that provides a read-only view on /nix/store, restricted down to a list of folders as could be provided by nix-store -qR. I’m wondering if anyone else has gone down this road and has any pointers before I invest much more time into it?

tejing · August 24, 2022, 12:42pm

There’s no need for a special filesystem, just get nix to list the closures of any relevant store paths, and then bind mount them all individually. That’s how nix makes the build sandboxes, too, afaik.

rhendric · August 24, 2022, 2:23pm

I thought there was a fairly low limit on the number of bind mounts, although I can’t find the source for that claim anymore and having tested it now a few thousand doesn’t seem like it’s too much for Linux to handle. Are there performance implications to having thousands of binds, though? Seems like the sort of scenario that the kernel might not have optimized for.

TLATER · August 24, 2022, 3:14pm

Can’t answer from a kernel perspective, but given the prevalence of containers, I don’t think thousands is necessarily that uncommon.

If you’re worried about this, maybe consider using https://nixos.org/manual/nix/stable/command-ref/new-cli/nix3-bundle.html

Flakebi · August 24, 2022, 7:32pm

There are several ways to achieve better isolation of /nix/store:

By far the easiest (and least secure), is to mount /nix/store, but make /nix/store execute-only (i.e. --x permissions).
That means the container can access paths it knows, but as the directory is not readable, it can’t list which paths exist and as they include hashes, paths should be impossible to guess.
One can create an overlay filesystem and remove not needed paths in the overlay. That has the advantage of only needing a single mount. On the other hand, newly created paths will be accessible inside the container. A while ago, I wrote a script that does that (and 1.): GitHub - Flakebi/container-store: Limit access to the nix store for containers by creating overlay filesystems
AppArmor can be used to allow access only to paths from a closure. nixpkgs has a helper for that:

security.apparmor.policies."bin.hello".profile = ''
  ${pkgs.hello}/bin/hello {
    include "${pkgs.apparmorRulesFromClosure { name = "hello"; } ([ pkgs.hello ])}"
  }
'';

TLATER · August 25, 2022, 5:00am

I generally like the ideas, but most of the hashes are well-known, since they just match those of nix packages. If you’re looking for an exploit in a specific, common package (e.g. libc) the search space is like 10 at best.

Atemu · August 25, 2022, 1:18pm

Docker has 100-200 layers at max. and you usually only bind-mount a few state directories.

kenmacd · September 22, 2022, 8:34pm

Just in case it didn’t come up in your search, I came across a nix-bubblewrap project that seems to have a similar goal.