FYI, if you’re okay with the impurity and are okay with a system-level service, I made this work for Go:
It uses a new-ish feature of the Go build tool that can call an external program to manage the cache.
FYI, if you’re okay with the impurity and are okay with a system-level service, I made this work for Go:
It uses a new-ish feature of the Go build tool that can call an external program to manage the cache.
I am using Nix as build system on some small projects, and the most annoying part is not the absence of .o cache (splitting to small enough derivations mitigates this), but copying whole src into Nix Store on every minor change in any single file. Fixing this may require a feature like ro-mounting the current directory into the sandbox pretending it was copied into Nix Store, or something alike
You could use traditional nix, that’s entirely a misfeatue of flakes. But also significantly amortized on large projects, so not really relevant to the discussion at hand. Ah, nevermind, you probably mean post eval. You can use the various lib.filter functions to reduce the impact to only actual source code, which helps a lot, especially for git repos.
Amusingly, buildstream does precisely this too through cgroups. There are some excellent ideas in that project ![]()
Allowing --sandbox-path to point to /nix/store/* paths could in theory cover this by separately computing the paths (doing only the eval or content-addressed hashing), but currently it does not work. I asked about it before: Nix store path as sandbox-paths target to avoid copy
The in-memory store (we just cleaned up) may help.
Perhaps there can be a built-in derivation builder which projects a child of a store object into its own store object. Since Nix is aware of built-in derivations, with “cheating”, it can replace:
with
I would also hope someday for some inotify and similar magic too. Then we can live listen on the subpath, in a manner similar to how impure derivations are implemented.
I don’t think that “decompose into smaller derivations” approach can help me (regardless of how exactly those derivations are generated).
The reason is, when we are dealing with C/C++, one indeed can run g++ -o unit.o unit.cpp to build a single translation unit and get its machine code in a single object file; the object files then can easily be brought together for linking. In other languages, performing build at such granularity is either cumbersome or not conceptually possible at all.
For example, in Go, when you run go build ./my-program, it compiles all dependent packages and links them into the executable binary. Running the build package-by-package and then somehow merging the build caches might be theoretically possible, but would be extremely messy and unstable, needing to rely on Go implementation details.
Even more problematic example: in Zig, AFAIK, there is no concept analogous to unit-by-unit compilation at all. Any executable or library is built directly from the sources with a single command. There’s nothing to decompose there. To get fast builds, we need to allow Zig compiler to access its internal cache.
I think that is true, but then I would get sequential builds instead of parallel? That’s not nice:( I suppose same goes for empty group, but I haven’t checked that yet.
Also, everyone who wants to build the project would then need to mess with users, groups and nix.conf. I really hope that we can get a solution which doesn’t require global configuration. At least, not messing with system users and groups.
Yes, I’ve seen this. It’s very impressive! And thank you for great explanation and comparison with similar projects in README! The reasons I, for now, decided not to use it:
Yes things (i.e. Zig) might have to adopt to fix Nix rather than Nix to fit Zig.
The benefit is that Nixpkgs builds from source!
So we are free to apply modifications and patches to make X fit Nix rather than the opposite ![]()
Even if they do, we get a solution which doesn’t “just work” out of the box, and requires the developer to somehow manage a cache daemon.
Compiler developers have zero reasons to implement nix-aware caching. Compilers already have caches which are as “pure” as nix in every practical sense, and are extremely simple (not requiring any third-party daemon, process or library).
Can we discuss how to realistically and cheaply fix things? We already have __noChroot, which is very close to what is needed, except the following issues:
nixbld* users, which creates problems when writing to any shared statesandbox = relaxed, which is not the default), which requires administrator privelegesextra-sandbox-paths, but that’s a global system option, not a per-derivation option)If Zig and others can standardize on a protocol for cache access, or at least support some kind of plugin for accessing the cache so we can build adapters, then support for that could be built into the Nix daemon so it does work out-of-the-box.
The Go cache is just an input-addressed cache, similar enough to the Nix store except that a “put” supplies the output, not instructions for how to build the output.
The magical way to solve only the uid issues and nothing else is id-mapped mounts. It’s a little complicated to set up (at least with raw syscalls), but it could be done by Nix as part of the sandbox setup.
Giving tools access to the full cache would still be impure, though. If Nix could manage the cache itself with some protocol for access then it could guarantee purity and it could even be enabled by default.
I don’t see how using GOCACHEPROG-like protocol is any better in terms of purity than just letting the compiler access the persistent cache directory. If there are no bugs in compiler cache logic, then it’s fine either way. If there is a compiler bug (e.g. a significant part of input is not hashed), then you’ll get impure derivation either way.
Therefore, if we want to have builds with guaranteed purity and fast compiler caches, we need a much more complicated protocol. I hope we can agree that its design and universal adoption is at least a matter of many years (for the record, I personally think it is not possible at all due to several technical reasons).
What I wanted to discuss here is having out-of-the-box ability to build with fast compiler caches without guaranteed purity, which seems to be several orders of magnitude easier to get.
I agree that a compiler bug where some part of the input is not hashed affects both mechanisms the same way (allowing leakage between builds), so yeah, it’s not as pure as regular builds. But there’s a difference in degree. I’d still say the protocol is more pure because:
Personally I don’t care about purity all that much, obviously since I made an impure solution for my own needs. Though I would be more likely to enable a built-in feature if it had the properties above. I’m just going from the assumption that if you want something to work out-of-the-box, i.e. built into Nix, the Nix team seems much less likely to accept impure solutions.
I would have thought that for sake of Bazel that Go would have a different way of working.
I’m not advocating compiler developers do it; but rather those in the Nix community.
It is the Nix community that has the opportunity here to rebuild the world from source that fits our paradigm.
Maybe there’s a glorious future where more code becomes Nix-aware from the start but until then, it is incumbent on us to do it ![]()
But isn’t it trivial for an attacker to compute an input hash for any known module? A hostile derivation can still poison the cache for standard (or any other) packages, so other derivations which depend on those packages will be compromised?
Or did I miss something? README says that “a build-specific directory to put the cached files in”, what does “build-specific” mean here exactly?
I agree that it’s a reasonable thing to expect from any nix functionality that’s available out-of-the-box.
I see the following solution to this problem: nix can support two classes of derivations: “pure” (the current default) and “relaxed”. The following rules must be satisfied:
Thus, if you want a correct build, use “pure”. If you want a fast (but attackable) build, use “relaxed”.
But we do have several impurity-injecting things in nix already. __noChroot, extra-sandbox-paths, pre-build-hook, __impure. I’m not proposing anything novel, only smoothing out and combining those into a mechanism which allows to do relaxed cache-accessing builds in a user-friendly way.
It’s trivial to compute but how to use that? If you can just craft a new input that generates a known hash then you’ve discovered a pre-image attack against the hash and that would be quite big news for cryptographic hashes.
There is no need to do a pre-image attack in the context that we are discussing. The attacker will compute the hash for compilation input and put arbitrary malicious compilation output in the Go cache. It’s not a content-addressed cache.
Once I experimented with setting empty build-user-group, exactly trying to circumvent the permission issue in an impure compilation cache for Go in Nix builder. I also had to set sandbox relaxed to make it work in my case. It does indeed causes builder to use same UID for each build, but then user is set to whatever user is executing nix build, so for single user nix install it’s especially not going to be consistent (when invocating with vs without sudo - you’re going to have permission issues with one if you previously built with other); for multi-user IIRC there are no issues in this regard because root is always used. But besides, for both multi and single user, I’ve encountered many times an issue with builder somehow creating /homeless-shelter directory in the real filesystem, and further invocations of nix build would error out complaining about it existing, and then I had to go delete it manually to continue. So generally speaking from experience, it feels more like a hack than a real solution.
edit: actually I running as root as single user should not be possible I think? So maybe these permission issues were problems with multiuser? I might not be remembering correctly, since it was a while ago. But anyway I remember that there were for sure still some permission issues with that approach.
You mean guess the input hash for popular packages? Ah right, that would work. So a shared cache can’t ever have that property. You’d have to partition it by user, then you only have to worry about bugs and not malicious code. That seems fine actually. But yeah at that point there isn’t any benefit from a protocol over direct access.
As for a quick and dirty fix, I think it should be possible to set up an id-mapped bind mount in a pre-build-hook, which would solve the uid problem. But that’s still system-level configuration. It would be great to be able to set a per-build pre-build-hook.