This really looks very exciting!
I was just looking into content-addressed binary caches again, because my Minio S3 cache (yes, I’ll have to migrate it to Garage at some point) that my CI pushes to already eats up over 1.5 TB of disk space. I didn’t find anything new though, so I gave up and resigned to just browse this discourse for a bit. And then your post comes up, lol.
Anyway, the blog post compares Gachix to established solutions like nix-serve and harmonia, which are, as you point out, not very efficient because they basically just serve local files (OK, they pack them up in NARs, but still).
But are you aware of the other existing binary cache implementations that have a content-addressed storage backend? These are:
Attic, a multi-tenant binary cache:
It uses S3 and (optionally?) PostgreSQL for storage and can deduplicate via its content-addressing. It doesn’t content-address individual store files, though, but chunks NARs using an algorithm: FAQs - Attic.
Unfortunately, I never got it to work reliably for me.
Snix store, part of the Snix reimplementation of Nix in Rust (fka/forked from Tvix):
It provides content-addressing with a per file granularity.
The general architecture of Snix is very modular: all components communicate using gRPC APIs for which both Rust types and Protobuf files exist.
For the store part, there’s a completely Nix-agnostic snix-castore serving Blobs (files) and Directories (=Git trees), while the actual snix-store provides only a PathInfo service on top of that, which translates Nix store paths to their content addresses.
What’s interesting about the last part is that users only need to trust the PathInfo service, because once they’ve got the content hash of a store path, they can securely substitute it from anyone that has it.
I guess the same would also apply to Gachix. While I’m not sure SHA1 hashes should be trusted for that use case, using SHA256 should theoretically be possible (IIRC).
To that point, despite the obvious similarities to Git (Merkle DAG, file storage, etc), the developers decided not to use it as their CA store mainly because of the hash function: snix/web/content/docs/components/castore/why-not-git.md at canon - snix/snix - Snix Project.
One advantage of their chosen hash function, Blake3, is that it supports verified streaming, which is helpful for scenarios where the store is mounted into a system (Virtio and FUSE are supported), but the files should only be fetched once they’re accessed.