Why does the NixOS infrastructure have to be hosted in a centralized way?

Maintaining a centralized build infrastructure (hydra) for nix is expensive. I’m questioning if there can be an alternative to it that’s scaleable, reliable and also safe.

Imagine a hypothetical implementation based around a torrent-like peer-to-peer system, where every time you request a new derivation, the system checks a bunch of other peers if anybody has already compiled this exact derivation. If you’re the first one to request it ever, you compile it locally and share it with others (like with torrents). If somebody has already built it, you fetch it from them.

Now, the huge question here is security: how does one guarantee that the derivation that I compile locally actually matches the sources that I claim it corresponds to? The answer is simple: never trust one source. If there’s 100 different peers and each of them has compiled a given derivation and all of them resulted in the same binary, it’s reasonable to assume that it’s indeed what it claims to be. If at any point any node re-compiles a derivation and it results in a different binary, it can issue a global flagging request and the system can verify which one is correct, which one is distributed maliciously, etc.

Obviously it’s a toy example, but I’m wondering if a system like this could work. Because I expect long-term NixOS maintenance to be financially bottlenecked by its hosting prices and having a distributed architecture would completely solve this problem. What are your thoughts?

1 Like
2 Likes

Good to know that I was trying to reinvent something that’s not stupid.

This idea has come up a few times, but never seems to have really gone anywhere?

It seems most agree it would be nice, but no-one has yet found the time to implement it fully.

2 Likes

The really hard problem here is knowing whether it is indeed 100 peers, or just 100 sockpuppets running of somebody Pi in a cupboard(the so called Sybil attack). As far as I know, there aren’t simple solutions to it.

6 Likes

There one unstable feature in Nix, which can help tremendously — content-addressable derivations.

Right now, store objects are primarily identified by the hash of the inputs. With CA, you identify them by the hash of their content instead. This will address the problem in two ways:

  • first, the trust will be required only to get the mapping from inputs to the hash of the output. Once you know content hash, you don’t need to trust anyone supplying the actual content. So, even if central infra is preserved, it now has to serve only the metadata.
  • second, I suspect that “early cutoff” (inputs are different, but the result is the same) happens all the time, so CA would cut down on the total volume of data significantly.
3 Likes

Good point. A proper solution would require some sort of proof of work or proof of stake, but this quickly derails into the crypto territory.

As for content addressing, at the point when you have a hash of the binary, you will have already built the said binary, so referring to it by hash is the same as referring to it by name and providing a checksum/hash of the binary to verify it (which is something that every distributed system does anyway). So I don’t think that having content addressing has any unique benefit for this specific purpose.

1 Like

I think matklad is suggesting that CA may enable (trusted) centralized infra to serve metadata a client could use to figure out what outputs it needs but free it to seek them from other sources if desired.

1 Like

That’s equivalent to just asking for a hash of a package from the same server. It does not have to be content-addressed to do that.

1 Like

And… ?

He didn’t say content-addressing was the only way to achieve it–he just said stabilizing it could help the situation in two ways. We could obviously do something else to get only one of the two benefits.

No, CAs do not help here. It’s the same already. You could have the small *.narinfo files served from a trusted source and you can obtain the rest in any way (as you know hashes of the output NARs).

1 Like