Obsidian Systems is excited to bring IPFS support to Nix

Ericson2314 · May 26, 2020, 10:20pm

First of all, thanks to @parkan of Protocol Labs for soliciting the Nix community in Nix and IPFS · Issue #859 · NixOS/nix · GitHub. We at Obsidian Systems talked things over, submitted a proposal to the IPFS Grants Platform, and it has recently been accepted!

IPFS is a peer-to-peer network for storing and sharing data in a distributed filesystem. It has been remarked before that Nix and IPFS would be useful integrated together and we are very excited to make this happen.

Here is where we think that IPFS and our plan can help make Nix even better:

Beating source bit-rot: sources can be stored and shared via IPFS so we don’t lose the data needed to reproduce builds.
Distributing builds via IPFS: builds can be distributed peer-to-peer, so a popular cache could have less hosting costs as it grows.
Content-addressed derivations: RFC 62 is necessary for using IPFS to its full potential, so we’ll be working with existing community efforts towards content addressibility.
Trustless remote building: Nix users shouldn’t need to be trusted to utilize remote builders. The internal changes to Nix which we need for IPFS support get us most of the way there, so we’ve gone ahead and included this feature as part of our plan.

Beyond all of these features, IPFS is a great like-minded technology which can help to make our storage and networking infrastructure as revolutionary as the ideas within Nix itself. This is just the beginning, and I’m sure many more great things are to follow.

You can read the details of our proposal at Propose Open Grant Nix × IPFS by Ericson2314 · Pull Request #43 · ipfs/devgrants · GitHub.

joepie91 · May 26, 2020, 10:45pm

It’s worth noting here that IPFS very explicitly does not persist or “store” data; despite what the name and marketing suggests, it’s a distribution mechanism and not a storage system, essentially a more granular BitTorrent. It’s also subject to the same availability issues (ie. dead torrents if noone seeds them).

That’s not to say that it can’t be a useful mechanism for P2P distribution, but please please please do not build anything that relies on it to not lose data, because that really is not what it was designed for, and it will result in a lot of lost data down the line if people have the wrong expectations.

deliciouslytyped · May 26, 2020, 10:47pm

e.g.
Something something dead torrents…
Something something crashed fediverse server storage… x)
Something something cdn but on other peoples computers.
(I’m not actually against IPFS or anything - I just agree with joepie that this specific point is not something that inherently necessarily increases reliability.)

This is less of a big deal for small data, but when you’re talking about say, 150TB (more at this point) of old builds, this becomes an entirely different matter? One does not simply …etc?

*Edit: you did say storing sources though…
Edit2: I’m very interested in p2p (perhaps, anonymized) CDNs so if anyone has any recommended reading material, I would be happy if they would post a reply or message me.

As a data point, at some point I saw some network with their own cryptocoin supplying distributed storage. - the name of which I don’t remember. It might be interesting to look at how that worked out. I doubt it’s more cost effective than say using some cloud based solution, so this has to stand on it’s own regardless of - or despite storage considerations. *Edit: I.e. people would join the network, supply storage, and get coin for uptime or whatever.

TL;DR: I think I’m just saying I think centralized storage at scale beats decentralized storage, unless there are other factors. You must maintain a centralized and upkept backing store (perhaps with say, geographic replication), you can distribute bandwidth costs.

Ericson2314 · May 27, 2020, 12:12am

Yes of course, merely running a p2p network node does not in an of itself make some system durable. I don’t mean to imply otherwise.

But a first step to durability is separating what is needed from where to get it. We already have “hashes mirrors” but it takes more manual effort and only supports flat files. This puts us on the road to stores/substituters subsuming hashed mirrors, and supporting more forms of data.

parkan · May 27, 2020, 3:18am

The persistence/permanence language is an unfortunate holdover from legacy comms around the IPFS stack – totally fair to call that out! The goal here is to bring in the full range of benefits of true content addressing to Nix (which has a long history of striving for these ends, cf. RFC 17/62) in a way that works with the growing ecosystem of IPFS compatible tooling. This doesn’t mean that Nix would be required to use IPFS – the self-certification, location-independence, trustlessness, and deduplication properties are entirely self-contained. The ability to use IPFS transports for sources, derivations, build artifacts, etc is largely a bonus on top.

That being said, while IPFS itself does not inherently provide any persistence guarantees, there’s a growing number of persistence services (“pinning” in IPFS parlance) that do. The upside here is that objects (derivations, sources, etc) stored in this way can be consumed without trusting the host, and can be retrieved without foreknowledge of their actual location. For example, multiple Nix machines on a local network can reuse objects without re-downloading them, e.g. in an offline situation, or in bandwidth-constrained environments.

parkan · May 27, 2020, 3:28am

Privacy-preserving/anonymous p2p is a huge (and important!) challenge, in part because performance and privacy create sharp tradeoffs, in part because judging who a good vs bad actor is in a decentralized network is an epistemological challenge. The current goals of IPFS (and sister project Filecoin, which provides a crypto-incentivized long-term storage layer) aim to solve performance, reliability, and data locality concerns first, but the project is acutely aware of the need for anonymized/deniable access. Check out research work from IPFS collaborators hashmatter for more on the latter: hashmatter · GitHub

davidak · May 27, 2020, 5:22am

you probably mean Storj or Burstcoin

hyperfekt · May 27, 2020, 9:17am

Very excited about this, thanks for taking it on, and thanks for funding it! I understand the canonical IPFS client has dramatically increased its performance since IPFS was last evaluated in the Nix community, and I’m really looking forward to the day Nix becomes content-addressed to take full advantage of IPFS’ potential.

domenkozar · May 27, 2020, 2:16pm

I’m quite biased here as I run Cachix, but that also gives me a quite a bit of time to talk to people and see their pain points.

Given the complexity IPFS brings to Nix (redesigning quite a bit of internals as well as APIs), I think we should make a case that it has to come with significant benefit to at least outweigh that.

Commenting on each of the thinks that would make Nix better:

Beating source bit-rot: sources can be stored and shared via IPFS so we don’t lose the data needed to reproduce builds.

This could be already done with current design.

Distributing builds via IPFS: builds can be distributed peer-to-peer, so a popular cache could have less hosting costs as it grows.

Storage costs of IPFS are much higher than a typical centralized storage (at the current pricing of providers), so I find this to be unlikely. So some numbers are needed to back this off.

Content-addressed derivations: RFC 62 is necessary for using IPFS to its full potential, so we’ll be working with existing community efforts towards content addressibility.

That is cool! Seems like more a dependency for IPFS.

Trustless remote building: Nix users shouldn’t need to be trusted to utilize remote builders. The internal changes to Nix which we need for IPFS support get us most of the way there, so we’ve gone ahead and included this feature as part of our plan.

I’d love to hear more about this, my understanding is that’s only true if derivation is binary reproducible, which we still have a very long way to go.

My main concern is really how performant it would be.

The direction I want to take with Cachix is to have specified binary cache API that allows dynamic queries so that one query can answer everything Nix needs to build further. That’s going to be hard to beat with P2P and it’s a requirement for Nix to have good UX.

All in all, IPFS is a wild dream, but the actual benefit/complexity ratio is quite low in my opinion. P2P software comes at incredible complexity overhead.

That is to say that trying and seeing what could be done is great and I do hope I’m wrong.

Profpatsch · May 27, 2020, 4:10pm

Not to be a downer, but I’m sure this requires a research grant all unto itself and can’t just be tackled as a sub-feature of IPFS.

I suggest you assume “perfect trust” in your project to be more focused on the already very big challenge of a stable IPFS backend. The current “nar signed by hydra” scheme can be re-used without inventing a trust scheme (“trustless” is an unobtainable myth imo).

joepie91 · May 27, 2020, 5:56pm

I recall having a conversation with one or some of the IPFS developers many years in the past, where I warned them ahead of time about how the “permanence” language would be interpreted. At the time, the concern was waved away.

At least the “permanent web” claim seems to have finally been removed from the website now, but it still makes misleading claims/implications about preservation, cost, and the current web being centralized (it isn’t).

Considering all this, and the significant amount of time I’ve had to spend trying to explain to people what IPFS actually does, I am more than a little skeptical about the way IPFS presents itself, and it will be hard to regain that trust. Just waving it away as “legacy comms” is, honestly, not going to be enough for that. Especially considering the remaining questionable claims.

This is, fundamentally, just a slightly different implementation of mirroring. Not that mirroring isn’t useful, but it’s not some sort of revolutionary new technique. Linux distributions have been doing this for many years, for example.

A “pinning service” is effectively just a hosting provider by a different name.

This all kind of underscores the problem I have with IPFS. The underlying technology isn’t half bad, but it gets marketed and hyped so much with so many misleading claims, new invented terms for old concepts, and so on, that it becomes difficult to genuinely recommend it to people - just because of how many swords there are in the presentation for people to fall into, and how much time I’ll have to spend talking people away from a cliff afterwards.

deliciouslytyped · May 27, 2020, 10:45pm

To quote Domen;

Given the complexity IPFS brings to Nix (redesigning quite a bit of internals as well as APIs), I think we should make a case that it has to come with significant benefit to at least outweigh that.

I read something somewhere about people looking into moving cache resolvers - or whatever it is, out of nix core. I think core work like this should be higher priority than it is and would enable a lot more experimentation with interesting backends. Nix is way overdue for some serious refactoring.

All criticisms aside, it will still be interesting to see what you can come up with prototypewise! And then we can reevaluate.

rickynils · May 28, 2020, 2:44pm

I assume that “trustless” here means that the build server doesn’t have to trust the build clients? The clients still has to trust the build server, right?

Could you shine some light on this quote from your proposal:

What data are you talking about here?

Overall, I find the proposal very interesting. I’m not sure if it is a good idea or not to tie Nix so close to IPFS, but I welcome any experimentation with these things.

The “trust maps” for derivation -> outputs you talk about in your proposal seems similar to what I have implemented for nixbuild.net, where I have to be able to represent several differing builds for the same store path in a safe way. The issue of “trusting the client’s Nix cache” I’ve handled by simply signing all uploads with a user-specific Nix signing key, in addition to storing any signatures provided during the upload. All signatures are then used as a “trust map” when finding out if a path is valid or not for a user. A user can freely decide on which signing keys that should be trusted. Of course, I don’t have to maintain a proper local nix store at the same time, so things are probably a bit easier for me.

Are you planning to publish some design documents somewhere?

davidak · May 28, 2020, 2:45pm

The difference using P2P technology is that any downloader can easily become a mirror and mirrors are found automatically. In the case of nix, it would be very nice to share packets in the LAN.

I was already very excited about the idea of using APT with Bittorrent. Unfortunately this has never been adopted.

I think Nix + IPFS is a perfect fit to try the idea out in practice.

Ericson2314 · May 28, 2020, 3:02pm

@rickynils

I assume that “trustless” here means that the build server doesn’t have to trust the build clients? The clients still has to trust the build server, right?

Exactly.

What data are you talking about here?

The output path, which is computed from hashModuloDrv on the client currently. The trick is to let the server do the final step of taking the imput drv hashes and computing the output path.

The “trust maps” for derivation -> outputs you talk about in your proposal seems similar to what I have implemented for nixbuild.net

Yes, I think the intentional store will be a huge boon for nixbuild.net :). With the intensional store, not only can one choose whether to trust a single build, they can also audit what dependencies (especially non-referenced build-time-only ones) were used by the build.

Are you planning to publish some design documents somewhere?

Of course! We’ll be writing multiple RFCs as the details get fleshed become clear from our initial implementation work.

bbigras · May 28, 2020, 6:03pm

Couldn’t we use zeroconf/avahi for that?

zimbatm · May 30, 2020, 9:12am

The most exciting part of this work for me is this PR: Git hashing as a new file ingestion method --- contains #3754 by Ericson2314 · Pull Request #3635 · NixOS/nix · GitHub

If Nix archives can be stored as Git object trees, it will inherently allow much more de-duplication of content. => cheaper storage, less bandwidth usage and easy hard-linking. The distribution mechanism is sort of orthogonal to this and it could be applied to zeroconf types of setup as well.

flokli · June 3, 2020, 9:16pm

I’m really really excited about this! Nice to see this being tackled!

I agree it might be necessary to refactor some parts in Nix first, and making “cache resolvers/remote store backends” more pluggable in general was something I thought about too.

I wouldn’t say this is a blocker before being able to do any work (at least not for a POC), but I’d consider IPFS, some avahi/zeroconf backend or other delivery mechanisms (maybe Cachix too) as an opportunity to shape the design of such an interface.

Looking forward for the developments and RFCs!

arianvp · June 5, 2020, 9:50am

This already works today by the way. I currently have both builders and cache set up using zeroconf. However the way nix consults caches isn’t ideal; it seems to be optimised for “there i just 1 cache” and starts to get annyoing when you have like 10

davidak · June 6, 2020, 1:57am

Amazing! Can you share your configuration, so we can use it too?