Obsidian Systems is excited to bring IPFS support to Nix

Firstly, I want to say I really appreciate the work already done to support Content Addressed derivations. I think this is a great thing to have, independent of any potential storage improvements.

I know that this is very early days of your work (and it’s very impressive what you’ve achieved so far) but i think it will be a tough sell for many users to effectively double the storage space required for the nix store (once in nix store, once in IPFS) as more packages become CA. Also, mounting the files from IPFS via FUSE is also not free, and can introduce a significant performance bottleneck. Perhaps IPFS import/export is a feature more suitable for more beefy buildfarm/cache(ix) infrastructure?

Substitution via traditional HTTPS (eg from cloudflare gateway) would allow the 90% of users who just want to get prebuilt dependencies without thinking about IPFS to still drive demand for IPFS hosted build products. Having those derivations be loaded into the nix store as usual, and take no extra space would provide a smooth transition path. Also, laptop users like me won’t have to run the bandwidth/cpu hungry IPFS daemon constantly to avoid warmup time and high latency requesting objects.

How would one go about proposing/working on gateway substitution as an extension to Obsidian’s work?

Can you please expand on this a bit? Don’t clients push .drv “buildplans” to remote builders? These buildplans actually ARE content addressed. The name contains the hash of it’s content, and the result of the build carries the hash of the buildplan. It’s the remote builder that has the power to label any arbitrary data as being built from a buildplan hash, not the clients. AFAIK, the only arbitrary data that clients are allowed to push is fixed-output derivations, so they can’t lie about what that derivation contains as it’s also content addressed. Is there a sneaky loophole hiding here that I don’t know about?

If I’ve got this wrong I’m very happy to be corrected!

6 Likes

Thanks!

Thanks for thinking about how to drive adoption. You make good points. My idea was to drive adoption with source code archival, where the space duplication is far less a concern, but there’s no technical reason we wouldn’t pursue both tracks.

Propose to who?

On the technical side, the thing to do is look at all the read requests we do in https://github.com/obsidiansystems/nix/blob/7c027d20a204592d23dde2d95d58d83ee197c681/src/libstore/ipfs-binary-cache-store.cc see if they have equivalents in the gateway interface. If they in fact do (contrary to what I was thinking), great, if they don’t, you might need to also propose changes to the gateway.

Sure

They use to copy the drv file and it’s closure, but more recently they send over just the derivation being built (parsed, to be used in memory only on the remote side) without it’s dependent drvs.

Yes

I’m a bit confused what you mean, but anyways, for traditional “input-addressed” derivations, the output path computation is quite complicated and is computed by hashDerivationModulo. There daemon has no hope of verifying this unless it has the entire derivation closure, so it blindly trusts the output path in the derivation that is sent over.

I have already removed the trust requirement for (fixed and floating) content-addressed derivations, since the daemon ignores any output paths sent over as part of those derivations.

Actually I think it will currently accept any store path, not just content-addressed ones (usually built by fixed-output derivations). This is one reason why it only works for “trusted users” on the daemon side.

Thanks again for the detailed reply. I had to sit on this for a while and do some more reading and thinking

It seemed like you had a roadmap of features to implement. I guess whoever is managing that roadmap?

This is probably massively naive on my part but this seems like a security hole that would be possible to close?

  1. Send the full derivation closure (.drv or parsed, does it matter?) so that the builder can verify path names for itself.
  2. Disallow accepting arbitrary store paths (except input addressed, but input addressed must verify received path matches hash) except for trusted users (to avoid breaking existing workflows?).
  3. Remote builder must build or substitute for any missing store paths from it’s own trusted sources.

Is there any use-case/scenario that this breaks? I’m very interested in the security model of remote builders, so if you can recommend a part of the codebase that I should read to get a better idea of how this all works please drop me a link.

Well take a look at the original grant https://github.com/ipfs/devgrants/blob/5fcf2ddcb294b911feb216d9b01d990af1654a56/open-grants/open-proposal-nix-ipfs.md. The grant recipient writes the initial proposal with this sort of thing.

Yup! See distributed builds require a trusted remote user · Issue #2789 · NixOS/nix · GitHub and the PRs I’ve linked to it (which will show up at the bottom). If we merge all of them then it’s fixed.

(Well, we also need to modify the build hook protocol so floating CA derivations can actually be remotely built, but that’s a separate issue.)

Any news? IPFS is getting stronger by the year.

1 Like

There is in fact some news: we have scoped out NLnet; Peer-to-Peer Access to Our Software Heritage and it has been approved. Once that is done, I hope we’ll have a better shot at merging the work we did least year, because the SWH —IPFS—> NIX workflow will hopefully make it more apparent what the use-cases are.

13 Likes

Thanks man for your great work!

3 Likes

Well cloudflare providing a lot of edge caching is one use I’m looking forward to. (cloudflare are fans of IPFS caching as you can well imagine)

Isn’t nix already using some cdn?

Replying to How to make Nixpkgs more eco-friendly / use less resources - #47 by nixinator here to not take the other thread more off-topic.


Glad you’ve taken a look!

IPFS doesn’t rely on use of the DHT — you can always connect to a peer directly and it asks existing peers regardless of “who” the main DHT says you should ask.

For the SWH ↔ IPFS bridge, for example, we completely ignored the DHT to start, saying there will just be a dedicated bridge node with a well known address, and one can connect to that.

(I believe one can also get peers from peers, which means that a bandwidth-saturated bridge node could in principle prioritize letting all its wannabe peers know who each other are, so they can act together as a CDN.)

Bottom line is the architecture is very modular, and one can experiment with many different strategies. I think our use-cases (distributing source code from archive, distributing builds) are great ones to test various strategies with too.

5 Likes

I absolutely agree with this, once the content address of the output nar is known… the distribution of the content can be on anything. :slight_smile:

Thanks for the updates on IPFS, seems like the project is progressing.

4 Likes

At long last, we have have written an RFC to hopefully get this work merged upstream!

https://github.com/NixOS/rfcs/pull/133

13 Likes

To the moon :rocket:

I’ve read the RFC, good job! But I have an IPFS specific question, is it possible to use the content raw from the store to plug it to ipfs (this is not recommended by ipfs, but the store being r/o it’s not so bad), instead of adding the files into ipfs and duplicating disk space usage?

1 Like

We have not tried to do that yet. It is certainly possible in principle yes, but in our implementation we have been more focused on integrating the concepts than tuning performance. The idea is to get a specification / interface we feel good about, and then after that we can finesse the implementation without needing to change the spec / interface.

Or, as they say, https://wiki.c2.com/?MakeItWorkMakeItRightMakeItFast

6 Likes

Just wanted to say, this would be a killer feature IMHO. The amount of problems I wouldn’t have at work, if a newly deployed machine out of the box could just substitute from any nearby machine that has the data. We’ve been mucking around so much with internal substituters and their configuration and also with sharing host stores with VMs. I quiet literally can hardly wait.

I guess content addressing and coordination/planning is the biggest blocker? Or are you having trouble with funding as well?

5 Likes

Biggest blocker is code review. Also should finish off [RFC 0133] Git hashing and Git-hashing-based remote stores by Ericson2314 · Pull Request #133 · NixOS/rfcs · GitHub but that’s on me.

4 Likes

Is obsidian systems still working on this? I’m sitting here pulling at 200KB/s from cache.nixos.org and just wishing it were over IPFS as another computer in my house had to go through the same thing yesterday :cry:

GitHub - obsidiansystems/ipfs-nix-guide: IPFS × Nix Guide was updated 3 years ago

6 Likes

Maybe try GitHub - cid-chan/peerix: Peer2Peer Nix-Binary-Cache

2 Likes

this works on local network, and you need to exchange keys between peers.

2 Likes

Just chiming in that IPFS support for avoiding concerns about source bit-rot sounds amazing and would help me pitch nix to peers.

1 Like