Pre-RFC: Generic content-addressed fetchers (IPFS, Radicle, etc.)

Motivation

Nix fetchers currently require location-based addressing (URLs with specific hosts/gateways), even when fetching from content-addressed systems like IPFS or Radicle. This undermines the decentralized nature of these protocols and creates fragility.

For example, to fetch from IPFS today, you must hardcode a gateway:

fetchurl {
  url = "https://ipfs-gateway.example.com/ipfs/QmXyz...";
  hash = "sha256-...";
}

This has several problems:

  • Centralization: The gateway becomes a single point of failure
  • Non-reproducibility: If the gateway goes down, the build breaks even though the content exists elsewhere
  • User lock-in: Different users may prefer different nodes/gateways (local daemon, LAN peers, public gateways)

Proposed Solution

Introduce content-addressed fetchers that separate the what (content identifier) from the how (transport/node configuration):

fetchIPFS {
  cid = "QmXyz...";
  # No hash needed - CID already encodes hash type and value
}

fetchRadicle {
  rid = "rad:z42hL2jL4XNk6K8oHQaSWfMgCL7ji";
  ref = "main";
  rev = "abc123...";
}

Users configure their preferred nodes/gateways in ~/.config/nix/nix.conf or via NixOS/Home-Manager modules:

ipfs-nodes = /ip4/127.0.0.1/tcp/5001 https://ipfs.io
radicle-nodes = rad://seed.radicle.xyz rad://127.0.0.1:8776

This keeps Nix expressions location-agnostic while allowing per-user/per-system transport policies.

Key Design Questions

  1. Generic framework: Should we design a pluggable fetcher system that can support any content-addressed protocol, or implement IPFS/Radicle separately?
  2. Hash redundancy: IPFS CIDs already contain hash information. Should fetchIPFS require an additional hash parameter for Nix’s own verification, or trust the CID?
  3. Offline/availability handling: What happens when no configured node has the content? Fallback behavior? Error messages? Default Gatway provided by NixInfra?
  4. Configuration scope: Should node preferences live in nix.conf, separate config files, or only in NixOS/Home-Manager modules?
  5. Security model: Do we trust the resolver/node, or only the content hash? How does this interact with Nix’s existing security assumptions?

Related Work

  • Issue #859 (Nix and IPFS)
  • Obsidian Systems IPFS work: Previous grant work on Nix × IPFS integration
  • RFC 0062 (Content-addressed derivations): Existing CA work, but focused on store paths rather than fetchers
  • Radicle mirror discussion (Discourse thread): Community interest in decentralized alternatives to GitHub

Next Steps

I’d like to gather feedback on:

  • Is this direction valuable to the community?
  • Which design approach makes most sense?
  • Are there other content-addressed systems we should consider (e.g., Git’s tree hashes)?
  • Who would be interested in collaborating on an RFC?
8 Likes

Consider pkgs.fetchgit, which has a similar hash redundancy ‘issue’. Do you understand why it is the way it is? Does the same reasoning apply to what you want to do?

If you drop the hash redundancy issue, then these seem like simple fetchers that could be added to Nixpkgs without an RFC, adding optional gateway configuration to the Nixpkgs inputs.

2 Likes

I dont know about this issue. Can you point me to it?

Like with your proposed fetchIPFS, etc., the Git hash theoretically already encodes the contents of the files to be added to the store. So one might similarly ask why the hash is necessary here:

pkgs.fetchgit {
  url = "https://...";
  rev = "deadbeef...";
  hash = "sha256-ABC...";
};

But it is. Having the hash — making the fetcher a fixed-output derivation, in other words — simplifies a lot of concerns about ensuring that the impure things the fetcher does in order to make a Git connection and prepare the checkout as desired and so on have no sneaky effects on the output.

2 Likes

To be clear, if you would find a fetchIPFS that takes both cid and hash useful, I’m encouraging you to submit the PR adding it to Nixpkgs right away, and let the discussion take place around the concrete code. The ‘generic framework’ doesn’t need to be debated; the concept of FODs is already that framework, I think.

2 Likes

I believe the hash is still necessary here as git still uses sha-1 which is known to have collision vulnerabilities

1 Like

In my mind a true CA fetcher only needs a set of digests. This helps with,

  • Abstracting the CAS implementations, allowing systems with incompatible digest algorithms.
  • Migrate away from vulnerable digest algorithms.
  • Also deal with collision vulnerabilities. You could require using 2+ digest algorithms if none are considered safe for the next 5-10 years.

However, it’s often nice to use a mutable reference to an artifact, like rev in fetchRadicle or ref in fetchGit. This is something that interacts with flakes as it separates a somewhat “immutable intention” with the required precise versioning used to have reproducible systems. Flakes are still experimental, but it’d be nice to understand where things should go when the mutable references go in the fetchers and the sets of hashes into lockfiles.


A good endstate would be to have a generic fetcher taking a set of mutable references like URLs, git repo+branch, etc; and have the lockfiles track a set of hashes.

What gets tricky with multiple (and thus alternative) mutable references is that when updating you’ll need to rank them and trust one, or try getting everything and verify all hashes just to observe how propagation delays bite you with “false” hash differences.

I think we need a few more discussions, as we need to address gateway configuration for both cases (Radicle and IPFS).
In my opinion, this should work analogously to how we handle GitHub tokens for example. However, this configuration might not be solvable through a pull request to nixpkgs and may require changes to Nix itself. We also need to decide what happens when no gateways are configured, especially while there are no official NixOS-operated nodes for IPFS or Radicle yet. (Radicle seems to be already in planning)

Just add a list of gateways for each to Nixpkgs config. Nixpkgs config is already configurable by file or on a per-import basis. The fetcher can access that list and, if unset, use some sensible default.

3 Likes

That’s not a practical problem. As I understand it, second preimage problem seems still quite safe with SHA1 (probably far from bgetting computable).

Overall I don’t get what the discussion is about. NixPkgs does have fetchipfs already!

I thought maybe some issue in the fetchipfs implementation, but the formulations seem like you weren’t even aware of this fetcher’s existence.

1 Like

One of the core questions here is whether we can implement a fetcher which is more general (i.e. captures both IPFS and radicle) and whether we should.

1 Like

Last time I checked it did not really work and it expects an ipfs node on the build system hardcoded to its default port.

As I already said. My main motivation for the discussion is finding a structure of how to give the user the ability to configure the gateways for content addressed protocols like IPFS and Rsdicle

I suspect that it wouldn’t be efficient to start some IPFS client just for a single download.

1 Like

I strongly believe we should go in a direction of not relying on centralized corporate infrastructure.

And the trend to distributed systems is very obvious.

2 Likes

(This is what tor.proxyHook does… is IPFS more expensive than Tor? I honestly have no idea.)

1 Like

This sounds similar to what https://ipld.io/ is trying to do

1 Like

Clearly I’m doing something right if people are sending me the Nick Land interview in the wild.

2 Likes

I think that’s missing the point. You might want to use a gateway or have your IPFS client on another port.

2 Likes

The port is configurable in fetchipfs, but yes. (you can most likely override such stuff by a relatively simple overlay)

1 Like