Obsidian Systems is excited to bring IPFS support to Nix

Ericson2314 · May 27, 2020, 12:12am

Yes of course, merely running a p2p network node does not in an of itself make some system durable. I don’t mean to imply otherwise.

But a first step to durability is separating what is needed from where to get it. We already have “hashes mirrors” but it takes more manual effort and only supports flat files. This puts us on the road to stores/substituters subsuming hashed mirrors, and supporting more forms of data.

parkan · May 27, 2020, 3:18am

The persistence/permanence language is an unfortunate holdover from legacy comms around the IPFS stack – totally fair to call that out! The goal here is to bring in the full range of benefits of true content addressing to Nix (which has a long history of striving for these ends, cf. RFC 17/62) in a way that works with the growing ecosystem of IPFS compatible tooling. This doesn’t mean that Nix would be required to use IPFS – the self-certification, location-independence, trustlessness, and deduplication properties are entirely self-contained. The ability to use IPFS transports for sources, derivations, build artifacts, etc is largely a bonus on top.

That being said, while IPFS itself does not inherently provide any persistence guarantees, there’s a growing number of persistence services (“pinning” in IPFS parlance) that do. The upside here is that objects (derivations, sources, etc) stored in this way can be consumed without trusting the host, and can be retrieved without foreknowledge of their actual location. For example, multiple Nix machines on a local network can reuse objects without re-downloading them, e.g. in an offline situation, or in bandwidth-constrained environments.

parkan · May 27, 2020, 3:28am

Privacy-preserving/anonymous p2p is a huge (and important!) challenge, in part because performance and privacy create sharp tradeoffs, in part because judging who a good vs bad actor is in a decentralized network is an epistemological challenge. The current goals of IPFS (and sister project Filecoin, which provides a crypto-incentivized long-term storage layer) aim to solve performance, reliability, and data locality concerns first, but the project is acutely aware of the need for anonymized/deniable access. Check out research work from IPFS collaborators hashmatter for more on the latter: hashmatter · GitHub

davidak · May 27, 2020, 5:22am

you probably mean Storj or Burstcoin

hyperfekt · May 27, 2020, 9:17am

Very excited about this, thanks for taking it on, and thanks for funding it! I understand the canonical IPFS client has dramatically increased its performance since IPFS was last evaluated in the Nix community, and I’m really looking forward to the day Nix becomes content-addressed to take full advantage of IPFS’ potential.

domenkozar · May 27, 2020, 2:16pm

I’m quite biased here as I run Cachix, but that also gives me a quite a bit of time to talk to people and see their pain points.

Given the complexity IPFS brings to Nix (redesigning quite a bit of internals as well as APIs), I think we should make a case that it has to come with significant benefit to at least outweigh that.

Commenting on each of the thinks that would make Nix better:

Beating source bit-rot: sources can be stored and shared via IPFS so we don’t lose the data needed to reproduce builds.

This could be already done with current design.

Distributing builds via IPFS: builds can be distributed peer-to-peer, so a popular cache could have less hosting costs as it grows.

Storage costs of IPFS are much higher than a typical centralized storage (at the current pricing of providers), so I find this to be unlikely. So some numbers are needed to back this off.

Content-addressed derivations: RFC 62 is necessary for using IPFS to its full potential, so we’ll be working with existing community efforts towards content addressibility.

That is cool! Seems like more a dependency for IPFS.

Trustless remote building: Nix users shouldn’t need to be trusted to utilize remote builders. The internal changes to Nix which we need for IPFS support get us most of the way there, so we’ve gone ahead and included this feature as part of our plan.

I’d love to hear more about this, my understanding is that’s only true if derivation is binary reproducible, which we still have a very long way to go.

My main concern is really how performant it would be.

The direction I want to take with Cachix is to have specified binary cache API that allows dynamic queries so that one query can answer everything Nix needs to build further. That’s going to be hard to beat with P2P and it’s a requirement for Nix to have good UX.

All in all, IPFS is a wild dream, but the actual benefit/complexity ratio is quite low in my opinion. P2P software comes at incredible complexity overhead.

That is to say that trying and seeing what could be done is great and I do hope I’m wrong.

Profpatsch · May 27, 2020, 4:10pm

Not to be a downer, but I’m sure this requires a research grant all unto itself and can’t just be tackled as a sub-feature of IPFS.

I suggest you assume “perfect trust” in your project to be more focused on the already very big challenge of a stable IPFS backend. The current “nar signed by hydra” scheme can be re-used without inventing a trust scheme (“trustless” is an unobtainable myth imo).

joepie91 · May 27, 2020, 5:56pm

I recall having a conversation with one or some of the IPFS developers many years in the past, where I warned them ahead of time about how the “permanence” language would be interpreted. At the time, the concern was waved away.

At least the “permanent web” claim seems to have finally been removed from the website now, but it still makes misleading claims/implications about preservation, cost, and the current web being centralized (it isn’t).

Considering all this, and the significant amount of time I’ve had to spend trying to explain to people what IPFS actually does, I am more than a little skeptical about the way IPFS presents itself, and it will be hard to regain that trust. Just waving it away as “legacy comms” is, honestly, not going to be enough for that. Especially considering the remaining questionable claims.

This is, fundamentally, just a slightly different implementation of mirroring. Not that mirroring isn’t useful, but it’s not some sort of revolutionary new technique. Linux distributions have been doing this for many years, for example.

A “pinning service” is effectively just a hosting provider by a different name.

This all kind of underscores the problem I have with IPFS. The underlying technology isn’t half bad, but it gets marketed and hyped so much with so many misleading claims, new invented terms for old concepts, and so on, that it becomes difficult to genuinely recommend it to people - just because of how many swords there are in the presentation for people to fall into, and how much time I’ll have to spend talking people away from a cliff afterwards.

deliciouslytyped · May 27, 2020, 10:45pm

To quote Domen;

Given the complexity IPFS brings to Nix (redesigning quite a bit of internals as well as APIs), I think we should make a case that it has to come with significant benefit to at least outweigh that.

I read something somewhere about people looking into moving cache resolvers - or whatever it is, out of nix core. I think core work like this should be higher priority than it is and would enable a lot more experimentation with interesting backends. Nix is way overdue for some serious refactoring.

All criticisms aside, it will still be interesting to see what you can come up with prototypewise! And then we can reevaluate.

rickynils · May 28, 2020, 2:44pm

I assume that “trustless” here means that the build server doesn’t have to trust the build clients? The clients still has to trust the build server, right?

Could you shine some light on this quote from your proposal:

What data are you talking about here?

Overall, I find the proposal very interesting. I’m not sure if it is a good idea or not to tie Nix so close to IPFS, but I welcome any experimentation with these things.

The “trust maps” for derivation -> outputs you talk about in your proposal seems similar to what I have implemented for nixbuild.net, where I have to be able to represent several differing builds for the same store path in a safe way. The issue of “trusting the client’s Nix cache” I’ve handled by simply signing all uploads with a user-specific Nix signing key, in addition to storing any signatures provided during the upload. All signatures are then used as a “trust map” when finding out if a path is valid or not for a user. A user can freely decide on which signing keys that should be trusted. Of course, I don’t have to maintain a proper local nix store at the same time, so things are probably a bit easier for me.

Are you planning to publish some design documents somewhere?

davidak · May 28, 2020, 2:45pm

The difference using P2P technology is that any downloader can easily become a mirror and mirrors are found automatically. In the case of nix, it would be very nice to share packets in the LAN.

I was already very excited about the idea of using APT with Bittorrent. Unfortunately this has never been adopted.

I think Nix + IPFS is a perfect fit to try the idea out in practice.

Ericson2314 · May 28, 2020, 3:02pm

@rickynils

I assume that “trustless” here means that the build server doesn’t have to trust the build clients? The clients still has to trust the build server, right?

Exactly.

What data are you talking about here?

The output path, which is computed from hashModuloDrv on the client currently. The trick is to let the server do the final step of taking the imput drv hashes and computing the output path.

The “trust maps” for derivation -> outputs you talk about in your proposal seems similar to what I have implemented for nixbuild.net

Yes, I think the intentional store will be a huge boon for nixbuild.net :). With the intensional store, not only can one choose whether to trust a single build, they can also audit what dependencies (especially non-referenced build-time-only ones) were used by the build.

Are you planning to publish some design documents somewhere?

Of course! We’ll be writing multiple RFCs as the details get fleshed become clear from our initial implementation work.

bbigras · May 28, 2020, 6:03pm

Couldn’t we use zeroconf/avahi for that?

zimbatm · May 30, 2020, 9:12am

The most exciting part of this work for me is this PR: Git hashing as a new file ingestion method --- contains #3754 by Ericson2314 · Pull Request #3635 · NixOS/nix · GitHub

If Nix archives can be stored as Git object trees, it will inherently allow much more de-duplication of content. => cheaper storage, less bandwidth usage and easy hard-linking. The distribution mechanism is sort of orthogonal to this and it could be applied to zeroconf types of setup as well.

flokli · June 3, 2020, 9:16pm

I’m really really excited about this! Nice to see this being tackled!

I agree it might be necessary to refactor some parts in Nix first, and making “cache resolvers/remote store backends” more pluggable in general was something I thought about too.

I wouldn’t say this is a blocker before being able to do any work (at least not for a POC), but I’d consider IPFS, some avahi/zeroconf backend or other delivery mechanisms (maybe Cachix too) as an opportunity to shape the design of such an interface.

Looking forward for the developments and RFCs!

arianvp · June 5, 2020, 9:50am

This already works today by the way. I currently have both builders and cache set up using zeroconf. However the way nix consults caches isn’t ideal; it seems to be optimised for “there i just 1 cache” and starts to get annyoing when you have like 10

davidak · June 6, 2020, 1:57am

Amazing! Can you share your configuration, so we can use it too?

arianvp · June 8, 2020, 10:34pm

It is something like this. I’ll try to clean my actual impl up and publish it on github

{ name, nodes, lib, ... }:
{
  networking.hostName = name;
  # Using avahi for now for mDNS; however I want to switch to networkd for some
  # boxes this will expose each node as   ${networking.hostName}.local aka
  # ${name}.local
  services.avahi = {
    enable = true;
    nssmdns = true;
    ipv6 = true;
    publish = {
      enable = true;
      domain = true;
      addresses = true;
      userServices = true;
      workstation = true;
    };
  };

  nix = {
    distributedBuilds = true;
    # have a binary cache public key for each node with nix-store --generate-binary-cache-key
    binaryCachePublicKeys = lib.mapAttrsToList (builtins.readFile (./. + "${node}.pub")) nodes;
    binaryCaches = lib.mapAttrsToList (name: node: "http://${name}:${toString node.config.nix-serve.port}") nodes;
    buildMachines = lib.mapAttrsToList (
      name: node: {
        hostName = name;
        sshUser = "arian";
        sshKey = "/root/.ssh/id_ed25519";
        system = "x86_64-linux"; # TODO paarameterize
        supportedFeatures = node.config.nix.systemFeatures;
        maxJobs = node.config.nix.maxJobs;
      }
    ) nodes;
  };


  services.nix-serve.enable = true;
}

Ericson2314 · June 21, 2020, 9:36pm

Just a small status update. We’ve opened and gotten merged numerous small cleanups, and the main feature work is in WIP PRs:

Git hashing as a new file ingestion method --- contains #3754 by Ericson2314 · Pull Request #3635 · NixOS/nix · GitHub (Git hashes)
Trustful IPFS Store by Ericson2314 · Pull Request #3727 · NixOS/nix · GitHub (IPFS store)

For anyone that want’s to follow along or give it a spin.

rickynils · August 13, 2020, 11:28pm

I’ve written about how build results are reused between untrusted users in nixbuild.net, maybe it is of interest to some readers of this thread: Build Reuse in nixbuild.net