Secure sharing of nix store with containers and VMs

I love the idea of sharing nix store between containers, VMs and their host. Unfortunately all solutions I’ve found so far seem to share the whole /nix without any filtering meaning every container/VM can see all the packages in the store and thus also all configurations of other containers/VMs etc. That is a big security problem for many use cases but a problem that can be solved thanks to Nix architecture so I was wondering maybe someone already embarked on such a journey?

Nix can provide list of packages that form a closure of a container/VM configuration so the only part missing is a way to create read-only “/containerOrVMroot/nix/store” proxy mount containing only packages in that list.

It could probably be done by bind-mounting every package in the list to a dedicated /nix/store directory of every container/VM but that would mean thousands of bind mounts which is a mess. Surprisingly there seems to be no “proxyfs” or “virtualfs” supported by Linux kernel, i.e. a file system that would allow creating read-only mount from a map of paths (from existing to virtual).

Regards,
Martin

6 Likes

I do closure-specific bind-mounts all the time, and it’s relatively easy.
For example, you want a sandbox for nixpkgs#bashInteractive, then you’d run:

$ nix path-info --recursive --json nixpkgs#bashInteractive | jq -r '.[].path'
/nix/store/8636r3d8rsk7c3l5xcgb1mn37pkfc84k-ncurses-6.3-p20220507
/nix/store/liqhmwv60pc8wi983brfhz264za61n3p-readline-8.1p2
/nix/store/rp4dwxbw4vk590lrbcf9r198cdjwjhmd-libidn2-2.3.2
/nix/store/scd5n7xsn0hh0lvhhnycr9gx0h8xfzsl-glibc-2.34-210
/nix/store/v48s6iddb518j9lc1pk3rcn3x8c2ff0j-bash-interactive-5.1-p16
/nix/store/wj6j8lrdlind44n7vqn864ga7y802vc7-libunistring-1.0
/nix/store/zkgyp2vra0bgqm0dv1qi514l5fd0aksx-bash-interactive-5.1-p16-man

$ nix path-info --json nixpkgs#bashInteractive | jq -r '.[].path'
/nix/store/v48s6iddb518j9lc1pk3rcn3x8c2ff0j-bash-interactive-5.1-p16
/nix/store/zkgyp2vra0bgqm0dv1qi514l5fd0aksx-bash-interactive-5.1-p16-man

Then you can put those through sort -u and pass them as read-only mounts to your virtualization solution of choice.

Something you can do with IFD is even nicer:

❯ nix repl --file '<nixpkgs>'
Welcome to Nix 2.10.3. Type :? for help.

Loading installable ''...
Added 16506 variables.
nix-repl> :b closureInfo { rootPaths = bashInteractive; }

This derivation produced the following outputs:
  out -> /nix/store/d8gig9im40znin4l8rb5qdygcm2pw8mv-closure-info

Once this is built, it will give you a directory with the registration and store-paths files in them. Then you bind-mount the store-paths, and pipe the registration into nix-store --load-db, and that will give you the database contents needed to do Nix builds. (You still need to setup a nixbld user/group by creating some files in /etc, but that’s rather easy as well).

Regarding your point about thousands of mounts, I haven’t had an issue with this yet. Not sure there is an actual limit we could approach with even larger NixOS configurations.

5 Likes

Good to know large number of bind mounts is not a problem. Are you doing it for VMs as well? Are there no problems with thousands of virtiofs mounts?

Thank you for great examples with explanation. I didn’t know about closureInfo, I will definitely try it.

Just pointing out: if you’re at all serious about security, nothing truly sensitive should be in /nix/store anyway, since it’s necessarily world-readable. That said, bind-mounting only the closure has some helpful properties regardless.

2 Likes

Why the extra --json | jq?

$ nix path-info --recursive nixpkgs#bashInteractive
evaluating derivation 'flake:nixpkgs#bashInteractive'/nix/store/g7lwga9p547cqyi9ym35bk78m1r12rky-libunistring-1.0
/nix/store/jna5qh81395w6xsalnl532pm9qvvvpjy-libidn2-2.3.2
/nix/store/6f66prpgx1qx4n6k450sxs3d157ia1ps-glibc-2.35-163
/nix/store/1rbdizyr45spsmig0sl9cykv4bami6lg-ncurses-6.3-p20220507
/nix/store/r94xi9sybla3rr4s3h0bqffar3b60h02-readline-8.1p2
/nix/store/6xg8qd02kjq0yx8sggzd76jqmzk37i52-bash-interactive-5.1-p16
/nix/store/ql8jwvsb63q9k0j0nmjcz0071lbdfq8d-bash-interactive-5.1-p16-man
1 Like

correct

Not if you are storing all your secrets in sops-nix or age-nix

Old habit… I pipe everything I can through jq these days ^^;

It’s not only secrets like passwords, keys etc. that are sensitive. Any information about configuration, network infrastructure, list of packages, services etc. can help an attacker. For that reason I’m seeking an effective solution that gives each “client” (container or VM) of a shared nix store access strictly only to packages from its closure and not one bit more.

Solution with bind mounts (or virtiofs mounts in case of VMs) seems to work but I still don’t like the fact it requires so many mount points. Wouldn’t it be cleaner and more resource friendly (and probably more performant) if there was a simple read-only proxy file system that would just delegate reading operations to the real file system according to configurable map of paths?

If such a file system really does not exist yet for Linux and there is no better solution than creating thousands of mounts for each store client I could take a look to see what it would take to create some simple version of it with FUSE. It would not be as performant as kernel driver due to context switches but it would work and in case it would turn out to be practical maybe someone with kernel development experience could then create proper VFS kernel driver replacement?

I’ve started with Nix just a few weeks ago so it’s only logical I am missing something :-). Please, let me know if such a file system would be of help in your point of view or if there is something hard hiding its implementation that I am not seeing yet (and which is probably the reason it does not exist yet).

3 Likes

Good point, I suppose.

That’s effectively how bind mounts work anyway afaik. I don’t see how this would get you anything except an extra layer of indirection you didn’t need.

Nix actually does exactly this for the build sandbox. It does the build in a chroot with only the build input store paths bind mounted. So you can’t e.g. enumerate the full nix store in a build; just the paths in your inputs.

3 Likes

Bind mount are not that cheap to setup, If you are willing to dive down the rabbit hole, landlock or other emerging security frameworks can help you.

1 Like

FWIW, I’m writing something like that for VMs.

In short, it’s a virtio-fs daemon which exposes a subset of /nix/store (or any other FS tree one cares to, I guess?) ; it just takes the path to the FS root to expose, and a list of files/directories in that tree to expose (like closureInfo can produce)

The end result is a (virtualized) read-only file-system that contains the VM’s environment (and its closure) mounted under /nix/store.
virtio-fs’s protocol is basically just yeeting memory pages with the desired content, so there’s no copying involved:

  • the daemon mmaps the files the VM requests, and transfers the pages to the hypervisor (and then it gets to VM) ;
  • reads from the VM directly hit those pages, the host’s kernel populates them as-needed without going through the hypervisor or daemon ;
  • the guest kernel doesn’t copy that data to its own page cache thanks to DAX (“Direct Access For Files”) so there’s no overhead in memory usage either.

It’s not released yet, but hopefully soon.™

PS: My main goal is basically to get UX similar to that of config.containers (incl. quick (re)build, etc.) and very-low overhead, while providing much stronger isolation:
no exposure of the whole store, using a modern hypervisor designed for security (cloud-hypervisor: it’s written in Rust and sandboxes its various components, like virtio device implementations), etc.

4 Likes

Anyone got any updates on this?

Having VM’s built with ‘nix build’ being able to access the host binaries and configuration is a major issue for me as well.

The guest should be isolated from the host filesystem and not have any access or visibility over any files of the host. Being able to execute binaries from the guest with files that are in the nix store as consequence of being installed on the host or other guests from that host is quite a severe system violation from my point of view.

Or is there any benefit or even need otherwise?

Regarding executing binaries, what’s the issue exactly? The same exact binaries could just as easily be produced in the guest if desired, in most cases. (Unless the nix code/sources used are secret.) They’re not setuid or anything. They’re just executable files that happen to be on the filesystem. That shouldn’t violate any security unless the content of the binaries themselves are secret, which doesn’t sound like what you’re talking about. Unless your system is EXTREMELY locked down, any user could curl binaries straight from the internet and run them. I don’t see how the binaries in the nix store are different.

From my point of view it’s pointless to discuss in this context hypothetical vulnerabilities as result of access to packaged binaries from the host OS and distracts from the main issue which is about the possibility to isolate the guest VM’s from system host files. I’m happy to take that particular discussion somewhere else if anyone is up to it.

Maybe not to everyone, but it seems that it is desirable for at least part of the community that the guest VM’s are not able to access files that are part of the host system. This includes not only binaries, but also configuration files such as for example the configuration.nix of the host NixOS or other guest VMs, as well as the firewall configuration file and so on. Even if doesn’t seems like a big deal for some, I’m sure that it is for other.

2 Likes

The problem I indeed see is, that someone with access to VM A inspecting the shared store could deduct configuration and/or availability of other “guests” as well as the “hypervisor”/host system, which is indeed a concern.

The very same reason is why we moved away from shared hosting services already years ago!

Yeah, that concern makes sense to me, though it’s importance is obviously context dependent. I’m just having trouble seeing the issue with the ability to execute the binaries… Anyway, I agree this topic should be about how to do it, not why it’s needed. Sorry to derail.

1 Like

I was testing a workaround which consisted of using the --store ‘/somePath’ flag for ‘nix build’ when building the VM. This would allow me to identify the necessary paths with ‘nix path-info’ and then I could delete whatever else I’d want to from the ‘/somePath’ nix store. Finally, I’d edit the resulting ‘./result/bin/’ script to change the mount point of ‘/nix/store’ to ‘/somePath/nix/store’. I think this should work in principle, but I’ve stumbled on what I think it’s a bug from nix as the ‘/somePath’ nix store creates symlinks to ‘/nix/store/’ rather than to '/somePath/’ although ‘/somePath’ has the files that it should be symlinked to. It is not possible to alter the targets from the symlinks in ‘/somePath’.

  1. Would anyone not see this ‘nix build --store /somePath’ behaviour as a bug? I can’t see why the new store would point to the original/default store

  2. Do you think this would work?

  3. Any other suggestions or alternative methods?

It’s correct behavior. Even when using --path, the build processes and the expected runtime usage works under the assumption that the store would be mounted on /nix/store. It just assumes that what’s currently at /somePath/nix/store is what will be mounted there. And this should be fine for you, so long as /somePath/nix/store is actually mounted at /nix/store from the guest’s point of view, making the links resolve correctly in the guest.

If you want the actual, canonical location of the nix store for build time and run time to be something other than /nix/store, you need to a) recompile nix, b) give up on using the binary cache at all, c) probably fix some bugs, since this usage path is pretty overgrown with weeds. I tried getting it to work a while back and gave up, but it should be possible, in theory. I’ve heard of others getting it working.

1 Like

Thanks for the explanation. However, it fails on the guest for some reason which I think it is because of the symlinks that point to the ‘/nix/store’ from the host.

When the guest is loading on stage 2, it errors with ‘systemd: Unit default.target not found.’ and when I check for the default.target on the ‘/somePath/nix/store/’ it is there but as a symlink to a path in ‘/nix/store/’. None of the sage 1 files are symlinked to the hosts ‘/nix/store’.