Peer-to-peer binary cache RFC/working group/poll

rickynils · June 27, 2023, 10:21am

Well, the narinfo provides the content hash (called the “nar hash”). This is just the sha256sum of the nar file. The narinfo can also contain a signature which is based on the store path, the nar hash, the referred store paths and the private key of the substitutor. So I’d argue that cache.nixos.org is already a trust database. Just fetch the narinfo for the store path you are interested in, verify its signature and then you are free to fetch the nar file corresponding to the nar hash from whatever storage provider you want. Of course, you then also have to validate that the nar hash matches what you receive.

NobbZ · June 27, 2023, 10:33am

Yes, it definitely is a trust DB, regardless of considering CA substitution or not. Though a definitiv problem in this scenario is, that this trust is coupled to substitution, and there is currently no way to ask one authority for a mapping, and another for the content.

The substitution process doesn’t allow to seperate these concerns from my understanding.

In general the substitution process is rather limited from what I can tell, and really only considers substitutes. It does not ask configured remote builders if they have a “product” available.

Instead, if everything goes south, remote A will happily build from scratch what builder B already has, just because of a race in the queue…

Not quite related to the topic of the thread though… Or perhaps it is, and the substitution and remote and local build should be unified into a single abstraction rather than 5?

rickynils · June 27, 2023, 11:19am

It doesn’t explicitly allow to separate these concerns today, but it is already making two separate requests, one for the narinfo and one for the nar. So adding support for additional “nar providers” wouldn’t be a big thing, conceptually. The signature verification and configuration of trusted keys would remain as it is today.

In general the substitution process is rather limited from what I can tell, and really only considers substitutes. It does not ask configured remote builders if they have a “product” available.

Yeah this not optimal, and somewhat confusing. However, the trust model Nix uses for remote builders is different than the one for substituters. When using a remote builder it is generally the coordinating machine that handles signing of the resulting store paths. So while builds might be available on remote builders, they could be missing the required signatures. I would love to see some unification in that area…

Not quite related to the topic of the thread though… Or perhaps it is, and the substitution and remote and local build should be unified into a single abstraction rather than 5?

Yes! Internally in Nix, these things are actually already somewhat unified, with the Store abstraction. However, I want to see that abstraction also on the outside, so you could mix and match different “stores” used for building, fetching, storing. See this Nix issue: Separate stores for evaluation, building and storing the result · Issue #5025 · NixOS/nix · GitHub

wmertens · June 28, 2023, 3:09pm

I must say I didn’t consider narinfo files, they indeed contain all the information for a trust DB.

Now I wonder what their overhead is, currently. Suppose we have an efficient way to distribute trust DB deltas, and you can join multiple providers, and each provider signs the deltas, how much local storage would you need to store these mappings?
And supposing we keep the trust DB mostly as an online thing, it looks like each narinfo is a separate HTTP request, would it make sense to allow bulk requests?

michaelCTS · July 31, 2023, 12:08pm

Has a working group been established? From what I understand John Ericoson did a lot of important work with content-addressable derivations and presented it about 3 years ago.

I’m not sure what’s left to do, but from a novice perspective, it seems like the ground work has been done to provide a service agnostic backend to retrieve packages from peers be if over TahoeLAFS, IPFS, bittorrent, or something else.

vcunat · July 31, 2023, 6:47pm

I don’t think it has. In all channels (including this one) I’ve seen just what I’d call initial brainstorming.

adam248 · August 23, 2023, 1:24am

I believe we need official guidance from the Foundation to continue with this goal.
Otherwise we remain fragmented with different solutions.
A member of the Foundation should be appointed to oversee the direction of this working group.
Just need someone to read all the comments and review the current proposed solutions, then choose the next path forward.
Without an executive decision we are just going to continue pursuing different solutions.
But multiple fragmented peer-to-peer solutions is sub-optimal for the community and the mission.

tomberek · October 1, 2023, 5:59am

A narinfo can refer to NAR’s via an absolute URL, thus allowing one to host the contents somewhere else. It is currently limited to a single URL, but the format allows for extension to support “URLs” or “Mirrors”.

ShalokShalom · November 4, 2023, 11:15am

I dont think, that decentralization can possibly become meaningless by centralized alternatives.

Decentralizati (via peer to peer) has unique properties, that can help us to mitigate the S3 issue, that we face.

As for the solution, I think IPFS is a great fit for that.

jeff-hykin · November 22, 2023, 4:26pm

Just wanted to say I’m really glad to see the discussions here.

Beyond caching/speed I really care about reliability (e.g. multiple people providing a resource instead of a one centralized backer). I think a minimal-design win would be to have fetchurl (and other fetches) take an IPFS hash as an argument to use as a backup if the main URL fails.

IMO the bigger problem, and the first step should be to establish a unified standard for content-addressed storage. If two programs (say PNPM and Pip) need a file, there should be a unified cache they can check/add to. We shouldn’t have a nix cache/store, a ipfs cache/store, and a pip cache, all of which contain the same massive shared-object file, or machine learning model binary.

Nix and IPFS maintainers would almost certainly be the best people to design such a standard. They know the most about dealing with various file systems (uppercase/lowercase, max-name-length), timestamp issues, hashing methods, etc. The biggest design challenge, I think, would be the ref-counts for deletion: making sure a particular content-address is truly not used by anyone.

Once a standard is designed, then Nix and IPFS can move towards it.

That will take a very long time, but I think it would reduce the complexity and make decentralized sharing straightforward, compared to adding complexity with a zipping, torrenting system for cached outputs.

ehmry · November 22, 2023, 6:09pm

ERIS is a standard for content-adressed storage that came out
of feature-creep frustrations with the standards provided by
other content-addressed storage systems.

https://eris.codeberg.page/

IPFS certainly has some standards that can be used for binary caches but
they tend to make everything extensible, which can cause more trouble than
it saves.

adam248 · January 2, 2024, 3:58am

I just had another discussion regarding P2P cache network here:

From what I hear we just need to do a PR to a good namespace nix.experimental.p2p-cache.enable or something good sounding like that to be able to focus our testing at scale.

It seemed like GitHub - zhaofengli/attic: Multi-tenant Nix Binary Cache was mentioned a few times as well. Maybe a good first candidate to test with?

rigille · February 12, 2024, 3:47pm

I’m definitely interested on this, looks like the best way to allow binary caches to scale with the number of users without increasing costs for the foundation as much

ShalokShalom · March 15, 2024, 10:56pm

I do see the benefit, that P2P seems to fit to the nature of NixOS.

I love p2p principles, and in a lot of cases (like, A LOT OF CASES) I consider the lack of professional users, and the presence of phones as platform as two of the main reasons, why it does not work.

I think Matrix is the best example: The vast majority of users is registered on matrix.org.

It was originally designed to be peer to peer, and they realized at one point, that a shortcut towards a decentralized approach is the prefered option for them.

Mostly due to the Python nature of the reference implementation, and the fact that most users do not run their own node - big surprise - it ended up being a centralized service, to a certain part.

And the next argument is, that p2p runs poorly on mobile devices. Internet connection, battery life and storage room are just three, but major reasons to not go p2p with mobile devices.

NixOS is opposite to that strong in the server market, which means we have a lot of people willing to stay online and share.

Its very similar to a Torrent service, at that point.

And we are using desktops and stationary laptops as the primary personal computing devices.

I think hosting packages the peer to peer way is a very good idea.

jeff-hykin · March 27, 2024, 1:56am

Also, not even online. Four out of five times the thing I’m installing is already installed on another machine in the same room as me. I want my students to be able to grab stuff from my Desktop through a local gigabit connection.

ShalokShalom · March 29, 2024, 11:26pm

Tvix, the Rust implementation of the Nix compiler,
does offer a few features that could be helpful for this idea.

More info on the project:

sambacha · June 13, 2024, 4:06am

You may find this blog post helpful then: Nix caching on a LAN

shimun · October 12, 2024, 1:26pm

One problem I do see with the current way NARs are hashed is that there is no way to verify them until the download is complete, git tree hashes would be better in that regard as would be switching to an merkel based hash algo like blake3 which would allow the seeder to provide merkle proofs for streaming verification.