I have a small self-hosted Forgejo instance with Forgejo Actions runners in a Talos Linux Kubernetes cluster. The runner uses Docker-in-Docker since native Forgejo Kubernetes runners are still being grown. One of the runner labels (nixos) is a custom image built with dockerTools.buildImage that ships with sandbox = true, among other things. Talos Linux defaults to user.max_user_namespaces = 0 which makes Nix sandboxing fail unless I apply a machine config patch:
I’ve applied that cluster-wide to keep things simple, but it does affect security. Unprivileged user namespaces have been the entry point for a long list of local-root privilege escalations (LPEs).
This exposure is host-level rather than container-level: a kernel exploit reachable through user namespaces becomes a Kubernetes-escape on whichever worker the runner happens to land on.
Questions
Is there a way to run Nix builds with sandbox = true that doesn’t require host kernel-level user namespaces? I know about sandbox = relaxed, but it doesn’t actually drop the user-namespace requirement, just permits more impurities.
Has anyone tried Kata / Firecracker (KubeVirt, Fly-style microVMs) that give the runner pod its own kernel so the host sysctl doesn’t matter?
I apologize for the big Kubernetes overlap and hope it’s okay that I ask these questions. It is, after all, my goal to have NixOS-based CI runners and run as little risk as possible. Currently I couldn’t run public CI runners with the amount of LPEs circulating without compromising the entire cluster.
Just tried sysctl user.max_user_namespaces=0 here with whatever random old version of nix (2.28.4) was already installed and it seems fine here.
Edit: I wonder if it’s something to do with the d-in-d environment specifically, though I can’t immediately think why it should be. Is it possible nix can’t see the value of the sysctl (through /sys)? Though looking at the code is should handle that as well, ultimately it just tries to create a test namespace and disables their use if they don’t work.
If what you’re saying is true – that sandboxing without user namespaces is a thing – I am not sure how to trigger it. Maybe another config setting, maybe that code path is not directly exposed. How would that work, though? I’ll try and dig, thanks for suggesting I can look at the source code.
Since I’m building things where I don’t have complete control over the derivations, and they’re not all pure, I opted to go for sandboxing. It sounds right. And normally I’m a huge fan of user namespaces. But upon realizing they’re disabled in Talos Linux by default because of a dozen CVEs, I am thinking maybe another layer of full virtualization is warranted to go with them.
Yeah, I tried that as the first thing. Turns out those directories are created by impure derivations somewhere in-between my build. So it’s not enough to rm -rf /homeless-shelter between CI steps, I need to move those to somewhere inside the derivation. I just abandoned the quick fix and went towards understanding sandboxing, since I like both pure builds and user namespaces.
My next attempt is going to try and enable user namespaces inside a lightweight VM.
Those “impure derivations” shouldn’t be able to create it, assuming sandboxing is enabled. And despite the sandboxing, the prefered way was to fix the drvs, rather than creating the shelter, to use a temporary folder as temporary home during the build.
Recently I learned that even a hook exists for the temporary HOME. I do not remember its name though.
Build users are a thing, but it would require running with elevated privileges in the pod (I’m not 100% sure about the full list), but probably CAP_SYS_ADMIN/CAP_DAC_OVERRIDE at minimum, and that probably can be done with the nix user running in pod (that is itself in a userns)?