Oh you might be right. But there is a reason its not used. One of the reasons is the bootstrapping problem.
The growing use of Radicle in the Nix Community opens up an opportunity to find a holistic solution for content addressing.
You implied I didn’t know about the fetchipfs function but I knew about it after Nix and IPFS · Issue #859 · NixOS/nix · GitHub and believed it to be not ideal for multiple reasons even though I forgot about the details in the meantime.
Thanks for the mention. I’ve been dogfooding Nix+git-hashing+IPFS for about a year now, and have a few thoughts.
The existing fetchipfsin Nixpkgs is based around UnixFS tarballs. That has several consequences:
It requires two separate hashes: one for the IPFS content-address (the root of the Merkle tree), and another for.the tarball it describes.
Tarballs are only supported by Kubo RPC, which is a private, locked-down API (usually on port 5001) that is generally not exposed to remote hosts. Hence it’s basically required to run a local Kubo node (or use an SSH tunnel).
Instead of the Kubo RPC API, we can instead use the Trustless Gateway API (usually running on port 8080; note that fetchipfs specifies a default port of 8080, but doesn’t actually use it):
It’s not private: many sites expose this API to the public Internet, including well-known ones like ipfs.io Hence we don’t need to run a local IPFS node if we don’t want to.
If we have Nixpkgs available, we can use the urls argument of fetchurl to try various gateways to find one that works.
When I don’t have Nixpkgs available (e.g. when bootstrapping), I’ve been using an IPFS_GATEWAY env var to allow overriding the http://localhost:8080 default. Hacky, but does the job for now.
It only requires a single hash: the “trustless” idea is that anything we download can be verified against the hash we used as a content address.
The easiest thing to do is turning a single file into a single “raw” block (not UnixFS!): in that case the address of the block is just the hash of the file contents, which is easily verified as a fixed-output derivation.
For example this fetchRawIPFS function takes a SHA256 as its only argument. I use this to fetch more complicated Nix files, like fetchGitIPFS below.
There’s a caveat that IPFS doesn’t like sharing files larger than 1MB to avoid DoS attacks. UnixFS avoids this by chunking files into several blocks, but that makes fetching and verifying harder
For structures which involve more than a single block, like directories or chunked UnixFS files, we can ask the Trustless Gateway to use the CAR format which is essentially a bunch of blocks concatenated together, plus some metadata.
CAR files generated by a gateway should include the full Merkle tree, and hence all the chunks, all the subdirectories, etc. though it might be incomplete, e.g. if some blocks aren’t available.
CAR contents are verifiable in principle, but a CAR file itself will have a different hash from the content-address we asked for. Hence a basic fixed-output derivation would need a separate hash to verify such a CAR.
The reason I got excited by Nix’s git-hashingfeature is that IPFS supports using git objects (instead of UnixFS) and uses their existing hashes (whether SHA1 or SHA256; though the latter is still quite rare). That doesn’t avoid the 1MB size limit (since git stores the entire contents of a file as one “blob” of arbitrary size), but it does let us verify a directory (extracted from a CAR) using the same hash as its content address (e.g. see fetchGitIPFS. Even nicer, that hash is also the same one used for version control, and already has an ecosystem of tooling and APIs
I’ve switched my personal Git repos to using git-hashing + IPFS, and it mostly works. The main issues I’ve faced are:
Always sending entire trees. I do this when pushing git trees into IPFS, and CAR files also tend to contain all referenced blocs.
There are smarter ways to do this, more like git’s delta transfers. IPFS has an approach called GraphSync but it seems to be abandoned (e.g. the Go library has been dropped as a dependency from Kubo, etc.).
Bootstrapping. I’m currently using an unholy mixture of builtin:unpack-channel, builtins.fetchurl, etc. to get an initial Nixpkgs, after which I can use pkgs.fetchurl. (I used to use fetchTree but that currently has an annoying bug
IPFS-specific annoyances, like Kubo trying to announce provider records for every individual block; rather than just the roots; etc.
I thought about it. And the nixpkgs configuration might be a good place for it if this does not cause a bootstrapping problem. If for example a configuration is build for the first time and the configuration points to the gateway configured by itself.
This is necessary if there are no trustable gateways before the first build.
Thanks a lot for your feedback. I finally have the time to continue studying your implementation and doing some more experiments.
Your implementation can now maybe be simplified somewhat since go-car is now available in nixpkgs.
My personal experiments for the moment will be around wrapping go-car functionalities as nix functions.
Your motivations seem to be around making git objects available in trustless ways. Have you already considered Radicle as that is its main use case?
Oo, nice! It could also be simplified if go-car added support for git objects but I didn’t try fixing that (yet) as I don’t really know Go, so I worked around it with Bash (eww…). UPDATE: I’ve cobbled together a Pull Request for that, so we’ll see!
Your motivations seem to be around making git objects available in trustless ways. Have you already considered Radicle as that is its main use case?
Yeah I’ve been keeping an eye on Radicle for a while, and their delta-transfer approach is definitely more efficient than fetching git DAGs from IPFS.
However, Radicle itself doesn’t really appeal to me; since I specifically don’t want any of the “identity” or “project” stuff it seems to be based around.
Looks like my PR might not get merged, since the go-ipld-git package which provides the git-raw codec for IPFS/IPLD projects that use Go is basically unmaintained.
Rather than maintain my own fork of go-car (or step up to maintain go-ipld-git), I decided to replace go-car in fetchGitIPFS with a Perl script to extract git-raw blocks from a CAR into a Git packfile. That should hopefully require little maintenance, since it’s very minimal, Perl5 is very stable, it doesn’t use anything other than the stdlib, and the formats seem stable enough (Git’s is very stable; CAR seems to only change slowly, and to mostly be backwards-compatible).
Unfortunately I’m not a Perl programmer so I vibe-coded it