Read it; looks good! Yeah CA derivations basically mean the same thing, but since path is content addresssed now the drv<->output mapping must be in the extra table, instead of the input-addressed path <->narhash (or other content address) mapping.
Your system should continue to work with this new stuff, since it already works with fixed-output derivations, and you still retain the advantage of delivering those benefits to the Nix derivations that exist today :).
As a data point, at some point I saw some network with their own cryptocoin supplying distributed storage. - the name of which I don’t remember. It might be interesting to look at how that worked out. I doubt it’s more cost effective than say using some cloud based solution, so this has to stand on it’s own regardless of - or despite storage considerations. *Edit: I.e. people would join the network, supply storage, and get coin for uptime or whatever.
builder for '/nix/store/8zs09r0qy444w2y9d5w8dwnvqwj67c99-nix-2.3.7.drv' failed with exit code 2; last 10 log lines:
unpacking source archive /nix/store/ijd7yg3g642wija13lsz00avwn552258-source
source root is source
no configure script, doing nothing
build flags: -j8 -l8 SHELL=/nix/store/6737cq9nvp4k5r70qcgf61004r0l2g3v-bash-4.4-p23/bin/bash profiledir=\$\(out\)/etc/profile.d
/nix/store/6737cq9nvp4k5r70qcgf61004r0l2g3v-bash-4.4-p23/bin/bash: ./config.status: No such file or directory
make: *** No rule to make target 'config.h', needed by 'precompiled-headers.h.gch'. Stop.
Hi @Ericson2314, I have a few questions about the implementaion that you may be able to clear up.
Is the intent that a significant portion of of Nix installations would also have an IPFS daemon running alongside to provide IPFS connectivity?
Is there any provision in the current roadmap to allow substitution via an IPFS gateway such as the cloudflare gateway?
When running an IPFS daemon and pushing nix store paths into IPFS, is the path added with --nocopy (filestore backed) or are the contents duplicated into ipfs chunk storage? Is this why the store path gets a nix temporary GC root?
When substituting paths from IPFS, are the paths fetched to the ipfs filestore and symlinked/hardlinked into the nix store? Would this create a ipfs pin to prevent cleanup from IPFS?
Regarding the following statement:
Trustless remote building: Nix users shouldn’t need to be trusted to utilize remote builders.
The way I currently understand building with nixpkgs, most derivations are built by a trusted source (hydra) and only packages you’re customizing or developing yourself need to be built locally or a remote builder. For these packages, you don’t know what the result CID will be until you’ve built it for the first time, and if you’ve built it to get the CID, you no longer need a remote builder to do the build for you.
Is my understanding here wrong, or am I missing some use-case that makes trust-less remote building more useful?
That would be nice! But we’re not trying to get ourselves in a network effects chicken-or-egg situation either. In particular, we ought to be able to share source code with upstream devs, other distros, etc., so we are just one part of the wider network.
That isn’t currently implemented, mainly due to time/budget constraints as the gateway and regular interface are different. But there’s no reason all the reading/downloading (not writing/uploading, of course) couldn’t also work.
Duplicated. I had not actually heard of --nocopy, but my inclination is trying to share storage is hard (to wit, --nocopy is still unstable), and long term the better architecture is for IPFS to manage the storage and just mount things for Nix, which is roughly the opposite of --nocopy
We don’t create extra any new temporary or permanent GC roots on the Nix side (LocalStore). We do implement Store::addTempRoot for the IPFS store, so Nix temp roots become IPFS permanent roots. We hope either Nix gain a hook for cleaning up these temporary pins, or IPFS gains a notion of temporary pins, so that we don’t leak pins on the IPFS side.
It’s just copied, same as with the other direction.
Your understanding about knowing the CID and not needing a build coinciding is totally correct.
The current bug is probably simpler/stupider than what you are thinking. Right now, clients to remote builders push arbitrary non-content-addressed data to the remote builder. This is fine for when the client is hydra, but not fine if e.g. a company wants it’s employees to use a build farm without giving any rouge employee the ability to push malware to commonly-queried paths and infect everyone else.
Firstly, I want to say I really appreciate the work already done to support Content Addressed derivations. I think this is a great thing to have, independent of any potential storage improvements.
I know that this is very early days of your work (and it’s very impressive what you’ve achieved so far) but i think it will be a tough sell for many users to effectively double the storage space required for the nix store (once in nix store, once in IPFS) as more packages become CA. Also, mounting the files from IPFS via FUSE is also not free, and can introduce a significant performance bottleneck. Perhaps IPFS import/export is a feature more suitable for more beefy buildfarm/cache(ix) infrastructure?
Substitution via traditional HTTPS (eg from cloudflare gateway) would allow the 90% of users who just want to get prebuilt dependencies without thinking about IPFS to still drive demand for IPFS hosted build products. Having those derivations be loaded into the nix store as usual, and take no extra space would provide a smooth transition path. Also, laptop users like me won’t have to run the bandwidth/cpu hungry IPFS daemon constantly to avoid warmup time and high latency requesting objects.
How would one go about proposing/working on gateway substitution as an extension to Obsidian’s work?
Can you please expand on this a bit? Don’t clients push .drv “buildplans” to remote builders? These buildplans actually ARE content addressed. The name contains the hash of it’s content, and the result of the build carries the hash of the buildplan. It’s the remote builder that has the power to label any arbitrary data as being built from a buildplan hash, not the clients. AFAIK, the only arbitrary data that clients are allowed to push is fixed-output derivations, so they can’t lie about what that derivation contains as it’s also content addressed. Is there a sneaky loophole hiding here that I don’t know about?
If I’ve got this wrong I’m very happy to be corrected!
Thanks for thinking about how to drive adoption. You make good points. My idea was to drive adoption with source code archival, where the space duplication is far less a concern, but there’s no technical reason we wouldn’t pursue both tracks.
They use to copy the drv file and it’s closure, but more recently they send over just the derivation being built (parsed, to be used in memory only on the remote side) without it’s dependent drvs.
I’m a bit confused what you mean, but anyways, for traditional “input-addressed” derivations, the output path computation is quite complicated and is computed by hashDerivationModulo. There daemon has no hope of verifying this unless it has the entire derivation closure, so it blindly trusts the output path in the derivation that is sent over.
I have already removed the trust requirement for (fixed and floating) content-addressed derivations, since the daemon ignores any output paths sent over as part of those derivations.
Actually I think it will currently accept any store path, not just content-addressed ones (usually built by fixed-output derivations). This is one reason why it only works for “trusted users” on the daemon side.
Thanks again for the detailed reply. I had to sit on this for a while and do some more reading and thinking
It seemed like you had a roadmap of features to implement. I guess whoever is managing that roadmap?
This is probably massively naive on my part but this seems like a security hole that would be possible to close?
Send the full derivation closure (.drv or parsed, does it matter?) so that the builder can verify path names for itself.
Disallow accepting arbitrary store paths (except input addressed, but input addressed must verify received path matches hash) except for trusted users (to avoid breaking existing workflows?).
Remote builder must build or substitute for any missing store paths from it’s own trusted sources.
Is there any use-case/scenario that this breaks? I’m very interested in the security model of remote builders, so if you can recommend a part of the codebase that I should read to get a better idea of how this all works please drop me a link.