Peer-to-peer binary cache RFC/working group/poll

rnhmjoj · June 24, 2023, 4:25pm

I think the first step before proposing solutions is to clerly spell out the problem, as Solene suggested.
There are two seperate problems we need to solve:

storing the historical binary cache (requires large storage, low bandwidth)
distributing the recent packages (requires little storage, a lot of bandwidth)

Problem 1. requires sharding the data and maintaining long term availability. I think this could be solved with chunked torrents and a group of voluteers (we need a catchy name, like the Guardians of the Cache).

Problem 2. could be solved with IPFS or a similar technology, but torrents would be bad for this use case.

ranfdev · June 24, 2023, 4:25pm

Why not? The .torrent files don’t contain the actual data. The bundles are just .torrent files in the end. The overall stored files would be the same as using a normal binary cache + a decent amount of metadata.

Solene · June 24, 2023, 4:44pm

Volunteers offer no guarantee that they will keep objects in their storage, hence, you have no availability guarantee for derivations. The current setup isn’t ideal, but you know that the official cache always keep all the derivation built ever.

rnhmjoj · June 24, 2023, 4:48pm

Volunteers offer no guarantee

The volunteers could be selected and handed a shard provided they can make such a guarantee.
Also, you can always introduce redundacy.

EDIT: The Nix foundation would also keep a copy of the full cache in some cold storage.

Nabile-Rahmani · June 24, 2023, 5:01pm

I would like to add (unless mistaken) that the nature of the binary cache is that derivation outputs are meant to be reproducible. So anyone in the swarm can reintroduce the data on demand. We don’t need fallback storage for that.

To be doubly clear, we are talking about build output artifacts which can always be recomputed, not source inputs which are considered valuable in cases where the origin URL is no longer reachable.

Was the issue that we can’t easily distinguish valuable inputs that came from the Internet and are still in cache.nixos.org ?

rnhmjoj · June 24, 2023, 5:07pm

I would like to add (unless mistaken) that the nature of the binary cache is that derivation outputs are meant to be reproducible. So anyone in the swarm can reintroduce the data on demand. We don’t need fallback storage for that.

Ideally yes, but in practice a lot of derivations are not bit-for-bit reproducible. So while you may be able to reproduce a functionally identical build artifacts, it would likely be a different file.

To be doubly clear, we are talking about build output artifacts which can always be recomputed, not source inputs which are considered valuable in cases where the origin URL is no longer reachable.

We are talking about both: the cache also keeps the source archives. As it frequently happens that URLs break and source become unavailable, this is actually the invaluable part of the cache.
For example, I just adopted an unnmatained software that was recovered from the binary cache: the author website and repository were lost.

Nabile-Rahmani · June 24, 2023, 5:11pm

Right. On that note, I wonder if it isn’t too unreasonable to also take this valuable data and offer it to https://archive.org/ and/or https://www.softwareheritage.org/, on top of distributing it with the swarm, if possible.

Doesn’t Nix support attempting to fetch from known mirrors like IA ? I think I remember packages having a mirror: scheme.

Actually, I think I remember a Nix blog post about a collaboration with Software Heritage, from which I first found out about them. It was about reaching 100% reproducibility.

EDIT: Long-term reproducibility with Nix and Software Heritage & Expanding coverage of the archive: welcome Nixpkgs! – Software Heritage

rnhmjoj · June 24, 2023, 5:19pm

Right. On that note, I wonder if it isn’t too unreasonable to also take this valuable data and offer it to https://archive.org/ and/or https://www.softwareheritage.org/, on top of distributing it with the swarm, if possible.

They would probably interested, yes, but the data need to be presented in an appropriate way.
I think it would be quite a lot of work to selected what’s worth preserving and attaching useful metadata to the raw cached paths.

That’s just a shortcut to using multiple URLs or domains, which are stored in pkgs/build-support/fetchurl/mirrors.nix with per-project definitions. I think using a global mirror is possible but it would require to map between different naming convetions (I mean like domain/project-name/package-version.tar.xz).

Nabile-Rahmani · June 24, 2023, 5:25pm

This could be done gradually by package maintainers wishing to help clean up the cache if it’s a pressing matter.

Perhaps documenting the process could be great for preservation efforts.

Wasn’t someone working on an analyser to list dead sources in cache ?

chkno · June 24, 2023, 5:43pm

If this storage network loses the last copy of an artifact, it has failed. Rebuilding derivations that are bit-for-bit reproducible so that the hydra.nixos.org signature still validates is a neat trick, and might be used in a sophisticated way to set the redundancy target of different artifacts (either as fun tuning and tinkering after everything else is working, or as a response to volunteer capacity being too low to support the full archive at full redundancy), but I think we should first attempt to find/assemble a mechanism that Just Works as a simple storage/serving service.

I was kind of hoping that we could make a thing out of:

A p2p daemon that volunteer cache providers run that just coordinates IPFS pins.
- It would have some signal of trustworthiness of cache providers (maybe just participation longevity) that it uses for shard balancing, to avoid the scenario where N brand new transient nodes come on line, are given responsibility for a new artifact, and all N of them go away, losing that artifact.
Cache providers then just run normal IPFS that does all the content acquisition, storage, and serving, with its pins managed by #1.
Nix is extended to fetch from multiple sources in parallel.
- I.e., attempt to fetch a thing, and if it hasn’t made reasonable progress in 2 seconds (configurable), without giving up on the first source, also begins fetching the same artifact from a second source. And after 5 seconds, a third source, etc. As soon as the fetch from any source finishes, the other concurrent requests are cancelled.
- Bonus if this can be done at a fixed-size-block level rather than with entire NARs.
(And it sounds like maybe an IPFS lookup caching service if normal IPFS DHT lookups are too slow?)

I was also hoping the IPFS community would just have a component that does #1, either ready to go or in development, ~~but I don’t see one. I bet the IPFS community would love to have such a component. Maybe we could build it together.~~

Edit: There is an IPFS component for #1: ipfs-cluster! Specifically, the collaborative clusters feature.

So maybe the only new functionality we need to implement is IPFS fetching and parallel fetching in nix? See also Obsidian Systems is excited to bring IPFS support to Nix

ericgundrum · June 24, 2023, 6:37pm

Could Tahoe-LAFS be suitable? It is designed to handle sharding, distribution, redundancy and more. Although, probably it does not scale to a world-wide network. (Managing redundant shard distribution for resiliency and performance.) That problem might be solvable by using region-based mirrors of Tahoe filesystems, which would provide even more resilient redundancy.

zimbatm · June 24, 2023, 8:39pm

Tahoe-LAFS is designed with a relatively stable set of nodes in mind. We have another ongoing effort to self-host the cache on bare metal and we might use it there.

That’s a bit of the problem regarding availability on a full P2P network; if you can’t reason about the probability of the nodes going down, the only way to compensate is to increase the number of duplicates, which then increases the total cost of storage proportionally.

Where P2P could truly shine is by becoming part of the distribution network. Even if the discovery only happens on the local network, it could help organizations and clusters to both get the NAR files faster and also save up on Internet bandwidth. This is something Microsoft is also using to distribute Windows updates.

The scheme doesn’t need to be too fancy either; have hosts discover each other through rendez-vous on the local network, and then query all of the hosts with a set of requested hashes. It’s boring, and probably quite effective.

adam248 · June 25, 2023, 3:49am

I only just noticed this thread. Here is my two-cents in reply to another post page.

adam248 · June 25, 2023, 3:56am

TLDR: I fully believe in a hybrid validator/peer-to-peer solution.
The cache.nixos.org is the validator which keeps copies of official build hashes.
The peer-to-peer system dynamically supplies the storage and bandwidth.

In the end we need a real-world experiment to start doing this and collect data on how well the system runs.

Seeders easily opt into the swarm thanks to a new services.nix-serve-p2p.enable: bool = false Nix option enabling a service to join the swarm. This could be added as a comment in the generated config to raise awareness and help it more easily gain traction.

I think first we should start with an experimental feature: services.experimental.nix-serve-p2p.enable: bool = false

Real-world testing can help us focus our efforts in the correct direction.

In the short-term cache.nixos.org will have to keep holding on to all its data until we can see how the peer-to-peer system really works in the real world. A slow transition is most likely the only way forward. But it is nice to see we have 12 months to breathe easy:
NixOS S3 Short Term Resolution!

I believe there is enough community support to start the experiment.
We can easily talk about it, but a lot of issues raised might not even be much of a problem when the rubber meets the road…

adam248 · June 25, 2023, 4:47am

I just had a thought regarding the storage guarantee.
Perhaps cache.nixos.org should monitor the health of a given package in the peer-to-peer system. A dynamic garbage collection model.

If there are less than X nodes holding a given package, then cache.nixos.org must retain a full copy.
But if the nodes for a given package are really high, then cache.nixos.org can do a garbage collection on itself for that package, as it is confident that it is fully available in the wild.
Then if the node counts start to drop, it can require that package for archiving purposes.

Yes, there is a small risk of losing a package for good, but that requires losing everything: the cache, the source URL going down and the original source code being lost and no one has an old copy to reupload to a new URL. Such a failure cannot really be our fault. This is just life. Sometimes things get lost with no way of recovering them. In this modern age were everything “needs to be preserved for the historical record” can be at times simply sacrificing the future for the past. Not a good way to live. Not a good way to run a cache. (A temporary file storage).
We are not the archive of the world.

vcunat · June 25, 2023, 6:36am

Regarding IPFS, I’d link some info about the last experiment I’m aware of:
https://github.com/NixIPFS/nixipfs-scripts/issues/11#issuecomment-1520711673

rickynils · June 25, 2023, 9:25am

TLDR: I fully believe in a hybrid validator/peer-to-peer solution.
The cache.nixos.org is the validator which keeps copies of official build hashes.
The peer-to-peer system dynamically supplies the storage and bandwidth.

I like this model very much. It makes sense to not try to solve the problems around distributing storage and trust at the same time. If we keep the “centralized trust” model of cache.nixos.org, much of the infrastructure (ie Hydra) can remain basically the same. Once the p2p distribution of nar-files (based on their hashes) is working, nothing is stopping us from also work on distributing the trust in some way (like CA-derivations, trusted builders, Trustix, etc).

wmertens · June 26, 2023, 3:21pm

I’d like to offer up the idea of a Trust DB here. Right now the mapping from input hash to NAR file is implicit in the nixos cache.

However, by simply publishing a mapping of input hash → content hash, a user can choose to trust a certain publisher, and then it doesn’t matter what cache system you use, the hash can be verified locally.

This is orthogonal to any other solutions. Having the outputs be CA is nice but not required. You don’t need to know what the input hash means, what attribute makes it, etc.

rickynils · June 26, 2023, 6:20pm

However, by simply publishing a mapping of input hash → content hash, a user can choose to trust a certain publisher, and then it doesn’t matter what cache system you use, the hash can be verified locally.

Is this not just the narinfo-files? When substituting, Nix already works in two phases. First it fetches the narinfo file for a store path from a substituter (cache.nixos.org). The narinfo contains the content hash of the nar and the url where the nar can be found. In the second phase, Nix fetches the nar-file itself. The url is always assumed to be a sub-path of the substituter’s domain, but you could imagine just querying some p2p network for the content hash in that second phase instead.

NobbZ · June 27, 2023, 9:33am

Nix currently has 2 ways to address a single “thing”.

Input addressing. This is the default. Here nix will calculate the hash of a derivation, and prior to realizing it thorugh build, it will ask substitutes whether they know the product of the drv with the hash calculated. The product again has a hash based on its inputs, the content hash in the NAR file is not just a content hash, but a signed hash, to verify no one tampered with the NAR during transfer.
Content addressing. Here nix will also first calculate the input based hash for a derivation. Then it will ask substitutes what the content address was. Then nix will remap that IA drv to a CA drv, and ask substitutes again if they know about that CA hashed drv, and then substitute that. Here the content hash is not signed, but reflects the bare content. Content addressed drvs do not need to be signed, but the mapping from IA to CA needs to be trusted.

In general, from what I understood, trustix wanted to provide the software and infrastructure to provide a decentralised “web of trust” for IA->CA mappings, but when I asked in the matrix how to set up things, it was pointed out to me, that the project is mostly dead.