Restricting /nix/store in a mount namespace

I’m working on a personal project that involves jailing applications using tools like bubblewrap. Bubblewrap, among other capabilities, invokes an executable in an empty-by-default mount namespace and allows the user to specify paths to be bind-mounted in. Any other files are inaccessible by the executable.

Naturally, I can bind the entire host /nix/store to /nix/store in the namespace. But suppose I want to limit the applications that can be run inside the jail. What would be really slick would be a way to filter the /nix/store inside the jail down to just the closure of the application. Bind-mounting each individual directory in the closure would quickly exhaust the maximum number of bind mounts.

So I’m looking for, or considering writing, something like a FUSE filesystem that provides a read-only view on /nix/store, restricted down to a list of folders as could be provided by nix-store -qR. I’m wondering if anyone else has gone down this road and has any pointers before I invest much more time into it?

1 Like

There’s no need for a special filesystem, just get nix to list the closures of any relevant store paths, and then bind mount them all individually. That’s how nix makes the build sandboxes, too, afaik.

I thought there was a fairly low limit on the number of bind mounts, although I can’t find the source for that claim anymore and having tested it now a few thousand doesn’t seem like it’s too much for Linux to handle. Are there performance implications to having thousands of binds, though? Seems like the sort of scenario that the kernel might not have optimized for.

2 Likes

Can’t answer from a kernel perspective, but given the prevalence of containers, I don’t think thousands is necessarily that uncommon.

If you’re worried about this, maybe consider using nix bundle

There are several ways to achieve better isolation of /nix/store:

  1. By far the easiest (and least secure), is to mount /nix/store, but make /nix/store execute-only (i.e. --x permissions).
    That means the container can access paths it knows, but as the directory is not readable, it can’t list which paths exist and as they include hashes, paths should be impossible to guess.
  2. One can create an overlay filesystem and remove not needed paths in the overlay. That has the advantage of only needing a single mount. On the other hand, newly created paths will be accessible inside the container. A while ago, I wrote a script that does that (and 1.): https://github.com/Flakebi/container-store
  3. AppArmor can be used to allow access only to paths from a closure. nixpkgs has a helper for that:
security.apparmor.policies."bin.hello".profile = ''
  ${pkgs.hello}/bin/hello {
    include "${pkgs.apparmorRulesFromClosure { name = "hello"; } ([ pkgs.hello ])}"
  }
'';
5 Likes

I generally like the ideas, but most of the hashes are well-known, since they just match those of nix packages. If you’re looking for an exploit in a specific, common package (e.g. libc) the search space is like 10 at best.

2 Likes

Docker has 100-200 layers at max. and you usually only bind-mount a few state directories.

2 Likes

Just in case it didn’t come up in your search, I came across a nix-bubblewrap project that seems to have a similar goal.

2 Likes