A common public nix cache?

I recently played around with @jkarni’s https://garnix.io/ service, and was surprised to find that it shares a single nix store and cache among all users. Given the effort that @domenkozar’s https://www.cachix.org/ and @rickynilshttps://nixbuild.net/ put to keep their users’s caches separate, this got me thinking.

When are separate caches needed

I see two main reasons for keeping caches separate:

  1. The uploader doesn’t trust all downloaders.

    For example, you use nix to provision your servers, and if the nix store and/or cache were public, and and someone learns the nix path (e.g. from a error message), they can download the code and look for secrets or vulnerabilities.

  2. The downloader doesn’t trust all uploaders.

    By configuring a nix cache as a substituter on my machine, I am essentially fully trusting those who can upload to that cache. If they were malicious, they could upload a malicious store path and maybe for my next system upgrade, I’ll fetch their bad package.

Many caches are bad

On the other hand, many caches have downsides. My nix.conf currently says

substituters = https://cache.nixos.org/ https://nixcache.reflex-frp.org https://cache.iog.io https://digitallyinduced.cachix.org https://ghc-nix.cachix.org https://ic-hs-test.cachix.org https://kaleidogen.cachix.org https://static-haskell-nix.cachix.org https://tttool.cachix.org https://cache.nixos.org/

and that slows down everything, as nix queries all these caches, only to notice that the devShell I am about to enter is on none of them¹.

All of these nix caches are open source projects where there reason 1 (secrecy) obviously doesn’t apply. If we can also somehow address reason 2, then maybe such projects could use a single cache.

A single public common cache?

The benefits of such a “public common nix cache” would be:

  • There could be many more projects (outside nixpkgs) where a user gets a cache hit when they run nix run github:someones/repo
  • That there might be occasional synergies between projects (e.g. when they both happen to use the same non-nixpkgs-built GHC toolchain).
  • Fewer caches to query for the user, so more responsive nix tools.
  • Fewer random people with cache upload persmissions that one has to trust to not turn malicious, or lose the credentials to someone who is!

(Almost) all the things we love about cache.nixos.org, but extended to more projects!

… via a trusted build

So how could we address reason 2 (at least before Trustix takes off)?

By using one (or a small number of) trusted build provider, like nixbuild.net or garnix.nix. I could imagine nixbuild.net to provide the a single public cache (like garnix does), or team up with cachix to let them do the hosting. Only stores paths built by the trusted build provider are uploaded there; I can’t upload locally built paths; this ensures that the store paths there are as trustworthy as the build provider.

When building something on nixbuild.net² I can indicate that this is a non-secret build and reason number 1 does not apply. Then the build results can be pushed there. Also, nixbuild.net could always safely use that cache.

I could configure all my public projects to do that, and would no longer have to worry about providing caches, telling users to configure them, or sharing build artifacts between them.

A widely used trusted build infrastructure also has other benefits due to scale. For example, it’s more likely that a widely used build farm supports building on strange systems/architecture than a random small software project.

Storage cost and cache eviction

One challenge that would need solving is cost and cache eviction. A open-for-all cache can probably not grow forever, like cache.nixos.org does. But a simple LRU cache eviction like Cachix currently provides is also insufficient, as busy projects will displace less busy ones (a problem I am already having when sharing a Cachix cache between related projects).

It seems that the cache provider would have to allow users to register roots that ought to be alive, and then charge for the storage cost of stuff kept alive by that root. Maybe discounted if some paths are kept alive by multiple users. It seems @domenkozar is working on support for GC roots in Cachix, so maybe not so unrealistic.

An even bolder thought?

Thinking a bit more boldly: I wonder if there could even be a way to offer that feature under the cache.nixos.org cache, so that everyone benefits without extra work? A way for the public to submit builds to trustworthy build network that’s that feeds that cache and a way to register roots (with a way to pay for that, I guess)?


¹ I could probably mitigate that by using the flake feature to configure some of these caches only for the projects where they might be relevant. Still, it’s complexity I wish I would not have to think about. I’m also thinking of providing a cache multiplexer service that would in parallel query a set of caches, and then redirect the user to the right one – then they have to still configure all the keys, but only one cache endpoint.

² Typically via CI, but possibly also from a local machine using remote builders, which which is great! A CI-only solution is unnecessary restrictive, I’d say.

8 Likes

For reference, this is how nixbuild.net handles GC roots (copied comment from Clarify situation wrt build caching · Issue #28 · nixbuild/feedback · GitHub):

  • Every time a path is built, used as a build input, or queried/uploaded/downloaded by a Nix client, that path (and its closure) will have its time-to-live (ttl) and refresh time updated. We can set different ttls for each scenario, or even for individual sessions/builds. We have not decided on how much of that should be configurable by end users (feedback is welcome!). You could for example run certain builds with a low ttl to avoid them consuming storage for too long, while other builds (or uploads) could have longer ttl to make them more persistant.
  • Every account will have a given amount of storage, and a given amount of time for which paths are allowed to be kept. At some point we might offer users to buy extra storage/time for a monthly fee.
  • Periodically, we garbage collect paths like this:
    1. Remove all paths that has a refresh time older than what the account is allowed to have.
    2. If the sum of all path sizes now exceeds the allowed storage for the account, proceed to next step, otherwise stop.
    3. Remove paths where refresh_time + ttl is in the past.
    4. If the account storage is still exceeded, remove more paths. The exact prioritizing of paths is yet to be decided.

This is not yet exposed to nixbuild.net users. We have a few things left to do in the garbage collector, and we need to decide what level of configurability (with respect to ttl values) we want to surface, and what the base limits of storage/time will be for non-subscription accounts vs higher-tier accounts.

3 Likes

Well yes, Pull requests · NixOS/nixpkgs · GitHub
Problem (1) really disqualifies any kind of public sharing and problem (2) shouldn’t happen in this case.

I mean, if a package build has reasonable ratio of usefulness and costs (“globally”, and is free/libre), I’d basically always go there. But I might be biased, as I’m involved in this farm/cache.

6 Likes

Of course if something can be added to nixpkgs, it should! (This is what you are saying, right?) But there is much that doesn’t qualify (experimental stuff, development branches, odd targets), does it?

We commonly run experimental jobsets for riskier mass-rebuild pull requests on Hydra, and just as all other builds it gets to cache.nixos.org. The farm is limited in native platforms, so there I expect is the biggest “hole that you might fill in”. (I don’t count cases where it’s unlikely for anyone to actually use the resulting binaries.)

This is something that could quite easily be improved on the Nix side, though: Race substitutors against each other · Issue #3019 · NixOS/nix · GitHub

1 Like

Having caches separated between projects that have a different set of users is a good idea, security wise.

The fact that Nix is slow to substitute can be fixed in Nix itself and/or on the Cachix side.

I might implement this on the Cachix side at least until Nix has a good story for async querying of narinfos.

3 Likes

Some WIP at Allow missing binary caches by default by arcuru · Pull Request #7188 · NixOS/nix · GitHub

1 Like

Distributed trust-less builds are the holy grail of nix building, especially for public caches.

its hard problems to solve, but it requires a consensus protocol, bit for bit deterministic build outputs and anti collusion features.

This would allow many software variants to exist, in a distributed fashion.

1 Like

I think formulating the problem in terms of 1 and 2 is helpful. Part of the idea of garnix is that if we have only one, but trusted, uploader, then we can have a more widely shared cache.

Regarding 1, I think there’s a lot to be said. If we have a notion of identity, we can limit downloaders. Depending on how that notion of identity works, that may even play well with a decentralized cache. (One example is having identities just be cryptographic keys, and then associating allowed keys with each private build. Another, which I briefly mentioned to @nomeata is using a challenge-response system, with the source code as the secret - think asking for hashes of slight variations of the source code.)

Regarding 2, I think we can separate caches from signatures. If we think in terms of signatures, it doesn’t really matter where the cache is hosted; each cache would just contain all the signatures it knows about for that build (I don’t really know why Trustix needs a ledger here - maybe @nixinator knows?). Again, this would help with a decentralized cache. This might be particularly helpful in certain local networks - the office LAN. But I can also imagine going further towards a torrent-like approach, where we split NARs and receive chunks from different sources.

1 Like

Don’t content-addressed derivations not require a signature since they can verify themselves? The main problem here is just transitioning from input-addressed to content-addressed, which requires mass rebuilds and the results aren’t generally available in current binary caches.

If a public cache only accepted content-addressed inputs, then theoretically anyone should be able to upload to it and have the outputs be trusted regardless of signature.

[EDIT] It seems like this is what Trustix is trying to do. Also the content addressing only confirms that the derivation’s hash matches its contents, not that it was actually built with the desired inputs.

I have to say that in my own explorations the Nix cache querying code has a lot of room for improvement. As far as I could gather it doesn’t even use http2 at all, unless I was doing something wrong. I kept hitting cloud flares connection rate limit on my own personal cache several times in my own testing, for example.

I confirmed this by running similar queries to what Nix is doing under the hood with the straight curl cli, and some flags to pump up to the parallelism, and was able to query 10,000 paths on two caches in just under 3 seconds using only two connections.

Also in just reading the code, it didn’t seem to make optimal use of existing connections. This is something that could definitely be improved to the point that the penalty for multiple caches just isn’t really an issue anymore.

4 Likes

No, that doesn’t solve this issue. A user wants a package. They have no way of easily finding out the hash of the content they want. You can’t (generally) obtain it without someone actually doing the build and then you have to trust that they’re not cheating.

5 Likes

I realized that after posting which is the reason for the edits.

Hence why a distributed consensus or proof of stake algorithm might fit perfectly.

At some point the ‘free’ donations of resources to nix/OS will dry up… and it will start to get very expensive to build (hydra farm) and distribute software (s3 buckets/ CDN’s) .

it’s already getting expensive.

So, the only way nix and systems like this can survive, is to use the power of the ‘edge’ of the network, and gets it’s users to build and distributed derivations in someway i can trust someone , or groups of people i have never met.

There is a way to do this, and the attempts I’ve seen are getting close.

This is the holy grail in nix building and should be the number one priority in the nix ecosystem right now.

It’s getting there!

just a shower thought (i’m currently getting electrocuted), you could even train large AI models, the remote nix build architecture fits distributed batch processing very very well.

Implementing a scheduler to spit out nix jobs to remote builders, it’s what hydra does via perl today.

Nix can be a distributed builder for anything that can fit in a derivation (with some limits on trust).

1 Like

Can you elaborate on the mentioned attempts? Are there any pointers to these?

1 Like

Possibly referring to Trustix.

Yes, referred from the first post already, but no problem. EDIT: well, your link might be better than above.

I indeed saw the reference to Trustix and still wonder why that isn’t implemented yet.
But maybe @nixinator is meaning to point at something as Holochain?
Don’t know

Because it is a very very hard problem to solve. It takes a lot of engineering effort. lots of packages are not binary reproducible due to quite a few factors. If there are different outputs being created for a single set of inputs, how do i trust your build over someone else’s build?

How do i prove that your not giving me ‘modified code’ ,on the other hand how do a prove that your giving me code i can trust. a single byte modification to a binary is all that is needed to compromise it ’

Distributed consensus algorithms and open ledger technology are difficult, however, that doesn’t stop many of them springing up daily.

Maybe this can be taken up again, there no reason why nix can’t have one distributed builder, it could have lots of different implementations.

Holochain? well i’m not familiar with this technology , does it have some property that one this problem.

ZKS seem to be gaining popularity, maybe that with some code analysis heuristics, it’s a research topic…

binary unproducablity doesn’t only effect distributed builds, it effects every code delivery mechanism we have currently on planet earth! (nix or other).

The answer so far has been signing, but , your showing the binary has been created and signed by a someone or something (machine) that has that selected private key, and has not be modified by someone without access to that key. (or can crack the crypto or the key had been leaked)

it hard problem, so it getting to mars, but with enough talented engineers and money to support them, then anything is possible.