Peer-to-peer binary cache RFC/working group/poll

Nabile-Rahmani · June 24, 2023, 3:03pm

if we were to go all in with community load sharing, storage could purge the binary caches (reproducible output, not “valuable” sources) and reduce costs on that front as a result.

This avenue would make sense if there is a large amount of seeders able to mostly take over the existing hosting solution, or its benefits outweigh the costs of S3/Fastly.

I think a distributed storage model like this is the best path to a long-term-sustainable, no-monetary-cost solution. There is so much good will in the general user community. If it’s as simple as responding to an “uncomment this line to help support this community” note in the nixos-generate-config default config, I think we’d easily get distributed storage serving capacity adequate to handle the binary cache. And if I’m wrong about this and we only get, say 50% of the needed capacity from volunteers, that’s a 50% cost reduction on whatever service picks up the rest of the load.

On top of that, per the transparency report, it looks like CDN costs are far, far higher than storage costs:

Storage: ~€10k/month

Fastly: Estimated at over €50k/month (this is hard to take into account with a buffer)

and in my poor man’s ears, 600,000 €/year sounds like an astronomical sword of Damocles if the foundation were to pay it in full some day (it rarely ends well when people depend on the generosity of for-profit companies, there are endless such tales, as painful as it is to admit).

This is precisely what torrents excel at.

It is a well-known meme that people primarily use torrents to download their favourite Linux distributions, and it is the recommended method by many projects, such as Arch.

Don’t know if there is prior art in integrating the torrent protocol into the actual package managers, however. But in any case, its power has been used in many areas, big and small.

Regarding security, traditional hosted torrent files contain hashes for all data blocks. Clients detect corrupion/tampering thanks to that, and I do believe implement blocklisting for unreliable/malicious nodes.

If for some reason we don’t want to store the source of truth torrent metadata files on the official servers, magnets can do the trick.

I assume in this case the client gathers seeders’ responses and checks that the majority agrees on hashes, I would have to look into it further.

(On an unrelated note, torrents have support for web seeds as fallback, though it is redundant given Nix already has HTTP substitutes.)