Obsidian Systems is excited to bring IPFS support to Nix

arianvp · June 8, 2020, 10:34pm

It is something like this. I’ll try to clean my actual impl up and publish it on github

{ name, nodes, lib, ... }:
{
  networking.hostName = name;
  # Using avahi for now for mDNS; however I want to switch to networkd for some
  # boxes this will expose each node as   ${networking.hostName}.local aka
  # ${name}.local
  services.avahi = {
    enable = true;
    nssmdns = true;
    ipv6 = true;
    publish = {
      enable = true;
      domain = true;
      addresses = true;
      userServices = true;
      workstation = true;
    };
  };

  nix = {
    distributedBuilds = true;
    # have a binary cache public key for each node with nix-store --generate-binary-cache-key
    binaryCachePublicKeys = lib.mapAttrsToList (builtins.readFile (./. + "${node}.pub")) nodes;
    binaryCaches = lib.mapAttrsToList (name: node: "http://${name}:${toString node.config.nix-serve.port}") nodes;
    buildMachines = lib.mapAttrsToList (
      name: node: {
        hostName = name;
        sshUser = "arian";
        sshKey = "/root/.ssh/id_ed25519";
        system = "x86_64-linux"; # TODO paarameterize
        supportedFeatures = node.config.nix.systemFeatures;
        maxJobs = node.config.nix.maxJobs;
      }
    ) nodes;
  };


  services.nix-serve.enable = true;
}

Ericson2314 · June 21, 2020, 9:36pm

Just a small status update. We’ve opened and gotten merged numerous small cleanups, and the main feature work is in WIP PRs:

Git hashing as a new file ingestion method --- contains #3754 by Ericson2314 · Pull Request #3635 · NixOS/nix · GitHub (Git hashes)
Trustful IPFS Store by Ericson2314 · Pull Request #3727 · NixOS/nix · GitHub (IPFS store)

For anyone that want’s to follow along or give it a spin.

rickynils · August 13, 2020, 11:28pm

I’ve written about how build results are reused between untrusted users in nixbuild.net, maybe it is of interest to some readers of this thread: Build Reuse in nixbuild.net

Ericson2314 · August 14, 2020, 2:36am

Read it; looks good! Yeah CA derivations basically mean the same thing, but since path is content addresssed now the drv<->output mapping must be in the extra table, instead of the input-addressed path <->narhash (or other content address) mapping.

Your system should continue to work with this new stuff, since it already works with fixed-output derivations, and you still retain the advantage of delivering those benefits to the Nix derivations that exist today :).

Ericson2314 · August 25, 2020, 6:46pm

Last week, we had the honor of being interviewed by @zimbatm for his great Nix Friday series of streams (discourse thread: https://discourse.nixos.org/t/nix-friday-streaming-about-nix-every-friday/4655\). Thanks again, @zimbatm!

You can watch the recording after the fact at Twitch or https://www.youtube.com/watch?v=FievtzvDbs8.

EdF · August 26, 2020, 12:21am

As a data point, at some point I saw some network with their own cryptocoin supplying distributed storage. - the name of which I don’t remember. It might be interesting to look at how that worked out. I doubt it’s more cost effective than say using some cloud based solution, so this has to stand on it’s own regardless of - or despite storage considerations. *Edit: I.e. people would join the network, supply storage, and get coin for uptime or whatever.

You may be thinking of Internxt (internxt.com).

colemickens · September 7, 2020, 9:55pm

@Ericson2314 do you have a derivation you test with?

I’m adding a new “host” to my nixcfg that is a test VM with an ipfs-enabled nix package. However, my first naive attempt failed: (collapsed now for brevity)

{
    nix = {
      package = pkgs.nix.overrideAttrs(old: {
        src = pkgs.fetchFromGitHub {
          owner = "obsidiansystems";
          repo = "nix";
          rev = "ipfs-develop";
          sha256 = "sha256-6jLx7Vtg3FujU3mR55B97X5GM90KZUtPVb2ggIKmEjg=";
        };
      });
    };
}

fails quickly with:

builder for '/nix/store/8zs09r0qy444w2y9d5w8dwnvqwj67c99-nix-2.3.7.drv' failed with exit code 2; last 10 log lines:
  unpacking source archive /nix/store/ijd7yg3g642wija13lsz00avwn552258-source
  source root is source
  patching sources
  configuring
  no configure script, doing nothing
  building
  build flags: -j8 -l8 SHELL=/nix/store/6737cq9nvp4k5r70qcgf61004r0l2g3v-bash-4.4-p23/bin/bash profiledir=\$\(out\)/etc/profile.d
    GEN    Makefile.config
  /nix/store/6737cq9nvp4k5r70qcgf61004r0l2g3v-bash-4.4-p23/bin/bash: ./config.status: No such file or directory
  make: *** No rule to make target 'config.h', needed by 'precompiled-headers.h.gch'.  Stop.

Thanks!

Ericson2314 · September 8, 2020, 1:19am

We have just been building via default.nix in the repo, which now used the flake.nix. Maybe get what you need from that?

Ericson2314 · September 9, 2020, 12:02am

The blog post for milestone 1 is up!

With it is the tutorial and explanation of the branches containing the work.

twoolie · September 11, 2020, 7:25am

Hi @Ericson2314, I have a few questions about the implementaion that you may be able to clear up.

Is the intent that a significant portion of of Nix installations would also have an IPFS daemon running alongside to provide IPFS connectivity?
Is there any provision in the current roadmap to allow substitution via an IPFS gateway such as the cloudflare gateway?
When running an IPFS daemon and pushing nix store paths into IPFS, is the path added with --nocopy (filestore backed) or are the contents duplicated into ipfs chunk storage? Is this why the store path gets a nix temporary GC root?
When substituting paths from IPFS, are the paths fetched to the ipfs filestore and symlinked/hardlinked into the nix store? Would this create a ipfs pin to prevent cleanup from IPFS?
Regarding the following statement:

Trustless remote building: Nix users shouldn’t need to be trusted to utilize remote builders.

The way I currently understand building with nixpkgs, most derivations are built by a trusted source (hydra) and only packages you’re customizing or developing yourself need to be built locally or a remote builder. For these packages, you don’t know what the result CID will be until you’ve built it for the first time, and if you’ve built it to get the CID, you no longer need a remote builder to do the build for you.

Is my understanding here wrong, or am I missing some use-case that makes trust-less remote building more useful?

Ericson2314 · September 11, 2020, 4:31pm

Glad to answer any quetions!

That would be nice! But we’re not trying to get ourselves in a network effects chicken-or-egg situation either. In particular, we ought to be able to share source code with upstream devs, other distros, etc., so we are just one part of the wider network.

That isn’t currently implemented, mainly due to time/budget constraints as the gateway and regular interface are different. But there’s no reason all the reading/downloading (not writing/uploading, of course) couldn’t also work.

Duplicated. I had not actually heard of --nocopy, but my inclination is trying to share storage is hard (to wit, --nocopy is still unstable), and long term the better architecture is for IPFS to manage the storage and just mount things for Nix, which is roughly the opposite of --nocopy

We don’t create extra any new temporary or permanent GC roots on the Nix side (LocalStore). We do implement Store::addTempRoot for the IPFS store, so Nix temp roots become IPFS permanent roots. We hope either Nix gain a hook for cleaning up these temporary pins, or IPFS gains a notion of temporary pins, so that we don’t leak pins on the IPFS side.

It’s just copied, same as with the other direction.

Your understanding about knowing the CID and not needing a build coinciding is totally correct.

The current bug is probably simpler/stupider than what you are thinking. Right now, clients to remote builders push arbitrary non-content-addressed data to the remote builder. This is fine for when the client is hydra, but not fine if e.g. a company wants it’s employees to use a build farm without giving any rouge employee the ability to push malware to commonly-queried paths and infect everyone else.

twoolie · September 14, 2020, 2:12am

Firstly, I want to say I really appreciate the work already done to support Content Addressed derivations. I think this is a great thing to have, independent of any potential storage improvements.

I know that this is very early days of your work (and it’s very impressive what you’ve achieved so far) but i think it will be a tough sell for many users to effectively double the storage space required for the nix store (once in nix store, once in IPFS) as more packages become CA. Also, mounting the files from IPFS via FUSE is also not free, and can introduce a significant performance bottleneck. Perhaps IPFS import/export is a feature more suitable for more beefy buildfarm/cache(ix) infrastructure?

Substitution via traditional HTTPS (eg from cloudflare gateway) would allow the 90% of users who just want to get prebuilt dependencies without thinking about IPFS to still drive demand for IPFS hosted build products. Having those derivations be loaded into the nix store as usual, and take no extra space would provide a smooth transition path. Also, laptop users like me won’t have to run the bandwidth/cpu hungry IPFS daemon constantly to avoid warmup time and high latency requesting objects.

How would one go about proposing/working on gateway substitution as an extension to Obsidian’s work?

Can you please expand on this a bit? Don’t clients push .drv “buildplans” to remote builders? These buildplans actually ARE content addressed. The name contains the hash of it’s content, and the result of the build carries the hash of the buildplan. It’s the remote builder that has the power to label any arbitrary data as being built from a buildplan hash, not the clients. AFAIK, the only arbitrary data that clients are allowed to push is fixed-output derivations, so they can’t lie about what that derivation contains as it’s also content addressed. Is there a sneaky loophole hiding here that I don’t know about?

If I’ve got this wrong I’m very happy to be corrected!

Ericson2314 · September 15, 2020, 8:38pm

Thanks!

twoolie:

I know that this is very early days of your work (and it’s very impressive what you’ve achieved so far) but i think it will be a tough sell for many users to effectively double the storage space required for the nix store (once in nix store, once in IPFS) as more packages become CA. Also, mounting the files from IPFS via FUSE is also not free, and can introduce a significant performance bottleneck. Perhaps IPFS import/export is a feature more suitable for more beefy buildfarm/cache(ix) infrastructure?

Substitution via traditional HTTPS (eg from cloudflare gateway) would allow the 90% of users who just want to get prebuilt dependencies without thinking about IPFS to still drive demand for IPFS hosted build products. Having those derivations be loaded into the nix store as usual, and take no extra space would provide a smooth transition path. Also, laptop users like me won’t have to run the bandwidth/cpu hungry IPFS daemon constantly to avoid warmup time and high latency requesting objects.

Thanks for thinking about how to drive adoption. You make good points. My idea was to drive adoption with source code archival, where the space duplication is far less a concern, but there’s no technical reason we wouldn’t pursue both tracks.

Propose to who?

On the technical side, the thing to do is look at all the read requests we do in nix/src/libstore/ipfs-binary-cache-store.cc at 7c027d20a204592d23dde2d95d58d83ee197c681 · obsidiansystems/nix · GitHub see if they have equivalents in the gateway interface. If they in fact do (contrary to what I was thinking), great, if they don’t, you might need to also propose changes to the gateway.

Sure

They use to copy the drv file and it’s closure, but more recently they send over just the derivation being built (parsed, to be used in memory only on the remote side) without it’s dependent drvs.

Yes

I’m a bit confused what you mean, but anyways, for traditional “input-addressed” derivations, the output path computation is quite complicated and is computed by hashDerivationModulo. There daemon has no hope of verifying this unless it has the entire derivation closure, so it blindly trusts the output path in the derivation that is sent over.

I have already removed the trust requirement for (fixed and floating) content-addressed derivations, since the daemon ignores any output paths sent over as part of those derivations.

Actually I think it will currently accept any store path, not just content-addressed ones (usually built by fixed-output derivations). This is one reason why it only works for “trusted users” on the daemon side.

twoolie · September 23, 2020, 5:24am

Thanks again for the detailed reply. I had to sit on this for a while and do some more reading and thinking

It seemed like you had a roadmap of features to implement. I guess whoever is managing that roadmap?

This is probably massively naive on my part but this seems like a security hole that would be possible to close?

Send the full derivation closure (.drv or parsed, does it matter?) so that the builder can verify path names for itself.
Disallow accepting arbitrary store paths (except input addressed, but input addressed must verify received path matches hash) except for trusted users (to avoid breaking existing workflows?).
Remote builder must build or substitute for any missing store paths from it’s own trusted sources.

Is there any use-case/scenario that this breaks? I’m very interested in the security model of remote builders, so if you can recommend a part of the codebase that I should read to get a better idea of how this all works please drop me a link.

Ericson2314 · September 25, 2020, 1:08am

Well take a look at the original grant devgrants/open-grants/open-proposal-nix-ipfs.md at 5fcf2ddcb294b911feb216d9b01d990af1654a56 · ipfs/devgrants · GitHub. The grant recipient writes the initial proposal with this sort of thing.

Yup! See distributed builds require a trusted remote user · Issue #2789 · NixOS/nix · GitHub and the PRs I’ve linked to it (which will show up at the bottom). If we merge all of them then it’s fixed.

(Well, we also need to modify the build hook protocol so floating CA derivations can actually be remotely built, but that’s a separate issue.)

gilescope · September 17, 2021, 9:12am

Any news? IPFS is getting stronger by the year.

Ericson2314 · September 30, 2021, 11:07pm

There is in fact some news: we have scoped out NLnet; Peer-to-Peer Access to Our Software Heritage and it has been approved. Once that is done, I hope we’ll have a better shot at merging the work we did least year, because the SWH —IPFS—> NIX workflow will hopefully make it more apparent what the use-cases are.

kamadorueda · October 1, 2021, 3:23am

Thanks man for your great work!

gilescope · October 6, 2021, 7:31pm

Well cloudflare providing a lot of edge caching is one use I’m looking forward to. (cloudflare are fans of IPFS caching as you can well imagine)

bbigras · October 6, 2021, 9:01pm

Isn’t nix already using some cdn?