Why does the NixOS infrastructure have to be hosted in a centralized way?

Raikiri · June 10, 2024, 4:51am

Maintaining a centralized build infrastructure (hydra) for nix is expensive. I’m questioning if there can be an alternative to it that’s scaleable, reliable and also safe.

Imagine a hypothetical implementation based around a torrent-like peer-to-peer system, where every time you request a new derivation, the system checks a bunch of other peers if anybody has already compiled this exact derivation. If you’re the first one to request it ever, you compile it locally and share it with others (like with torrents). If somebody has already built it, you fetch it from them.

Now, the huge question here is security: how does one guarantee that the derivation that I compile locally actually matches the sources that I claim it corresponds to? The answer is simple: never trust one source. If there’s 100 different peers and each of them has compiled a given derivation and all of them resulted in the same binary, it’s reasonable to assume that it’s indeed what it claims to be. If at any point any node re-compiles a derivation and it results in a different binary, it can issue a global flagging request and the system can verify which one is correct, which one is distributed maliciously, etc.

Obviously it’s a toy example, but I’m wondering if a system like this could work. Because I expect long-term NixOS maintenance to be financially bottlenecked by its hosting prices and having a distributed architecture would completely solve this problem. What are your thoughts?

wamserma · June 10, 2024, 5:44am

Raikiri · June 10, 2024, 6:22am

Good to know that I was trying to reinvent something that’s not stupid.

delliott · June 10, 2024, 6:31am

This idea has come up a few times, but never seems to have really gone anywhere?

It seems most agree it would be nice, but no-one has yet found the time to implement it fully.

matklad · June 10, 2024, 7:03am

The really hard problem here is knowing whether it is indeed 100 peers, or just 100 sockpuppets running of somebody Pi in a cupboard(the so called Sybil attack). As far as I know, there aren’t simple solutions to it.

matklad · June 10, 2024, 7:10am

There one unstable feature in Nix, which can help tremendously — content-addressable derivations.

Right now, store objects are primarily identified by the hash of the inputs. With CA, you identify them by the hash of their content instead. This will address the problem in two ways:

first, the trust will be required only to get the mapping from inputs to the hash of the output. Once you know content hash, you don’t need to trust anyone supplying the actual content. So, even if central infra is preserved, it now has to serve only the metadata.
second, I suspect that “early cutoff” (inputs are different, but the result is the same) happens all the time, so CA would cut down on the total volume of data significantly.

Raikiri · June 10, 2024, 9:14am

Good point. A proper solution would require some sort of proof of work or proof of stake, but this quickly derails into the crypto territory.

As for content addressing, at the point when you have a hash of the binary, you will have already built the said binary, so referring to it by hash is the same as referring to it by name and providing a checksum/hash of the binary to verify it (which is something that every distributed system does anyway). So I don’t think that having content addressing has any unique benefit for this specific purpose.

abathur · June 10, 2024, 1:32pm

I think matklad is suggesting that CA may enable (trusted) centralized infra to serve metadata a client could use to figure out what outputs it needs but free it to seek them from other sources if desired.

Raikiri · June 10, 2024, 2:19pm

That’s equivalent to just asking for a hash of a package from the same server. It does not have to be content-addressed to do that.

abathur · June 10, 2024, 3:06pm

And… ?

He didn’t say content-addressing was the only way to achieve it–he just said stabilizing it could help the situation in two ways. We could obviously do something else to get only one of the two benefits.

vcunat · June 13, 2024, 1:08pm

No, CAs do not help here. It’s the same already. You could have the small *.narinfo files served from a trusted source and you can obtain the rest in any way (as you know hashes of the output NARs).

le-chat · June 14, 2024, 1:19pm

Regarding trust to *.narinfo, that is association between sources and compiled packages, one possible improvement could be using confidential computing for builds.

Modern hardware often provides way of remote attestation, so in principle it’s possible to run a program and get a cryptographic certificate, that the program with such hash has been run in a trusted execution environment on a machine manufactured by Intel/AMD/Nvidia/etc with given public part of ephemeral key. This program could perform a Nix build and sign by the ephemeral key, that for derivation with a hash below, for inputs with narHash’es below, it gets as a result a following narHash.

See for example https://confidentialcontainers.org/, https://confidentialcomputing.io/, TEE, Intel SGX, Arm TrustZone.

This would make more possible to reuse unknown binary caches with less risk.

By the way, it looks like for a Nix this problem is even more important than for other Linux distributions — when you add new binary cache with its own signing key (for example, if you wish to avoid heavy compilation of some CUDA-related stuff), you have no control for which packages it will be really used. If you run nixos-rebuild at the moment, when cache.nixos.org doesn’t have your package yet, this secondary binary cache might inject malicious binaries to you.

amjoseph · June 22, 2024, 9:29pm

No, it really isn’t.

I built and continue to operate^[1] a buildfarm which is much larger than Hydra x86_64 (I crossbuild everything else) – it can mass-rebuild (i.e. staging) the full release packageset every 18 hours.

People massively overestimate the cost of running a buildfarm because they’ve been programmed to think that overpriced cloud nonsense is the only option.

The only thing that’s inherently expensive is the bandwidth donated by fastly. Nixworld needs to break its bandwidth addiction. That requires deep architectural changes which would be painful, but extremely beneficial in the long run. Unfortunately the will to make those changes is certainly not present at this time.

scroll down to the images. That was only two-thirds of the cluster at the time; it is even larger now. ↩︎

amjoseph · June 22, 2024, 9:30pm

Sadly, the nix community fundamentally does not understand or appreciate decentralization.

This is the “meta problem” that prevents the community from dealing with all its other problems (including the recent epidemic of poo-flinging). I think all those other problems can be fixed, but I don’t know how to fix the metaproblem. Maybe it can’t be fixed.

I spend a lot of time thinking about this.

le-chat · July 16, 2024, 7:10am

Hello, have you posted more details about your setup anywhere?

And about bandwidth optimization: which architectural changes do you mean?

monadam · July 16, 2024, 9:21am

I’m uneducated on the topic and could be mistaken, but I see parallels with Bitcoin and DeFi (decentralized finance). Someday Bitcoin will probably become DeFi in some sense, but currently DeFi doesn’t solve a problem important enough for most Bitcoin users to justify the effort.

Changing minds is a matter of selling an idea. It’s not enough for an idea to be interesting or even better. It must solve a specific tangible problem that users face today. This is human nature.

Again, completely uneducated here, but I could see decentralization becoming more useful in the future as the package system and userbase grow. Perhaps this might convince others. Maybe the time isn’t ripe. It’s not trivial to implement.

Raikiri · July 16, 2024, 9:57am

To me, now feels like the best time ever to start thinking about decentralizing nix: with the whole NixOS foundation drama, I have no confidence that its infrastructure is going to last long-term and that the community will be able to successfully take over its administration in case it collapses. If NixOS was a decentralized zero-trust system to start with, then its administration would have no power and it would need next to no infrastructure to be maintained, because each willing user would host a part of it.

monadam · July 16, 2024, 10:28am

Hypothetically, yes, a collapse would certainly motivate development of decentralized build system. However, this seems unlikely to happen in the foreseeable future. It would be more effective to appeal to something like faster build times.

srd424 · July 16, 2024, 12:38pm

I doubt the current community & infra would fall apart overnight - and I say that as someone who is pretty supportive of the efforts of the various “splinter groups.” There’s a lot of people still ‘here’ and a fair number of companies with some technological investment. I think a reasonable worst case scenario would be a slow decline, which would allow plenty of time to investigate alternatives.

There are already a few people thinking about alternatives, whether centralized or decentralized. It’s also worth noting that even with a centralized approach, different decisions could be made around what to build and what to store that would reduce costs and still be useful (e.g. it’s nice to keep a lot of binary ‘history’ around but not actually necessary for day-to-day use.)

I agree at the moment the lack of immediate necessity probably means people are less motivated to put effort into decentralized solutions, but there’s also stuff like ca-derivations that if fully implemented/deployed would make them substantially easier, so people may be holding fire a bit.

TBH I feel the design of nix pretty goes some way to allowing this already, in the sense that you can just checkout nixpkgs and be able to build everything yourself - it’s just inconvenient, and until recently at least hasn’t seemed necessary.

waffle8946 · July 16, 2024, 1:10pm

Especially since hydra is run on the most expensive option of them all

Do you mean CA derivations, more efficient compression, or something else?

Also, IMO we’d benefit from the mirror model, like every other Linux distro, rather than basically half the world being unable to use NixOS effectively due to reliance on Fastly POPs. (I’m aware of the TUNA mirrors, but still…)