It also has the problem that most users won’t be upgrading between the same versions as hydra is. The diff between builds in hydra is between very nearby commits, whereas users are likely to be upgrading between commits that are weeks apart. We’d have to keep diffs between HEAD and more than one previous commit, which would only exacerbate the storage problem.
EDIT: Plus the binary cache protocol would then get more complicated. User machines would have to ask the cache “I need this path, what paths do you have diffs from for it?” “I have diffs with these paths” “Ok I have that one, please give diff”
We probably need to increase the substitution granularity, though. I wrote an article evaluating the efficiency of some potential solutions to this problem earlier this year: Nix Substitution: the Way Forward
The TL;DR being: adding an output-adressed chunk/file store would greatly reduce the amount of data we’d have to download in most use cases.
I’ve written GitHub - input-output-hk/spongix: Proxy for Nix Caching which splits NARs into chunks using GitHub - folbricht/desync: Alternative casync implementation and saves roughly 80% of space (always depends on the contents of the cache, of course).
The remaining issue is that Nix won’t down-/upload those chunks directly, so the transfer is still as inefficient as always.
Implementing chunking in Nix may work for uploading, but downloading would need a separate chunk store and the end result would be about 20% additional disk overhead because the /nix/store still needs to be uncompressed.
One alternative for that would be FUSE, but that’s going to be slow and probably buggy, as well as restricted to a few platforms, but in theory you’d trade time for space, if that’s critical.
The Solaris IPS packaging system did away with transferring package archives entirely, and transfers individual files, with a content-addressing scheme. This was in large part because even when a package update changes something, many files are unaffected and can be reused between versions.
While you might lose a little efficiency from single-archive-compression (many doc files together, etc) it seems that this efficiency comes instead from avoiding repeated downloads.
Because of reproducibility, it’s hard to skip rebuilds, guix has a “graft” system for packages update with really minor changes to avoid recompiling the whole dependency graph, but I suppose it kills reproducibility?
There already is, in the typical Nix fashion it’s called replaceRuntimeDependencies. It works by going through every file in the system and replacing the store path of the original package with a replacement.
It won’t work for any package update, though: it assumes the replacement store path to have the same length as the original. So, you could do this for security patches and minor bug fixes.
If the goal is to increase NixOS sustainability, I’m not sure using IPFS will be an improvement: the IPFS node software is a big resource hog. I admit it’s been a few years since the last time I tried it, but the node in idle would use a few cpu percents and several Mb/s of bandwidth with just a handful of pinned files.
IPFS is all nice (at least in theory) and it gives file level granularity, but we could probably go away with just distributing nars using plain old torrents much more efficiently.
IPFS draws CPU when it’s actively contributing to the P2P network routing, this is not mandatory and not really useful if you use it locally to access IPFS content. It got a better wrt resources too.
I’m not sure exactly, and honestly my experience wasn’t too bad with it, but I’ll posit some combination of:
insufficient download parallelism and/or streaming with earlier HTTP
iops amplification and latency cascades with small files and duplicated metadata updates
likely running on early zfs, which had some pretty strong transaction commit latency
likely running on spinning media with short concurrency queues
conservative sync writes in the package manager
the need to keep two copies of the file (or hardlink in some cases?); one in the store and one in the system
different expectations; it may have seemed slow compared to (say) apt (on ext4), but it was already faster and more convenient that the previous Solaris pkg system so there was plenty of room to start with conservative implementation with room for further optimisations
I wrote Spongix after evaluating nix-casync. Compared to it, Spongix offers:
garbage collection based on LRU/max-cache-size and integrity checking
metrics
proxying and caching multiple upstream caches behind itself
uploading chunks to S3 compatible stores
signing narinfos if no signature is present (for automated build farms on top of Cicero)
It probably has a few more features that i forgot about, but we’ve been running it in production at IOG for a few months now and it performs much better than our previous setup with Hydra->S3.
CA derivations are strictly better, but It is really hard to get shit merged in this community — thus Hydra still doesn’t have support for them upstream, and we cannot being testing thing in Nixpkgs.
I remain incredibly frustrated that we have this thing 90% done, and there is no will to unblock willing maintainers ready to do work getting the feature out in the real world.
Part of the problem is that the roadmap targets 3.0 for flakes and CA for 4.0.
We are not 3.0 yet, and flakes are far from being ready, despite what Eelco says…
And last time I tried CA, despite having set up the CA enabled cache, it bootstrapped some compilers, which ultimately caused a world-rebuild. Also cachix does not yet support CA.
And what annoys me with CA: Its design still relies on a single authority per cache to transform IA to CA as known by that particulat cache.
What we need instead is a distributed “trust” network that can point from IA not only to the CA, but also to a “store”/“mirror”/“cache”/IPFS or whatever to download it.
3.0 is already a big giant mess that we cannot review, we need to minimize the scope. Flakes is a huge ball of unaudited complexity we are in no way ready to stabilize in one go.
I am not saying Flakes should be 4.0 and CA 3.0, but we should focus on layering so we can stabelize e.g. part of the new CLI (stuff like nix show-derivation) without worrying about Flakes.