Following up on my post Content-addressed Nix − call for testers - #206 by gador
is there a binary cache available that supports CA derivations?
Following up on my post Content-addressed Nix − call for testers - #206 by gador
is there a binary cache available that supports CA derivations?
The original posting mentions cache.ngi0.nixos.org if that’s what you mean.
If you mean one to self host, I’m not sure.
Yes I ment self-hosting…how does our cache instance gets served? This should obviously be able to serve the realisations endpoint…
I am not aware of any cache that’s being (self)hosted. The ngi0 cache was down last I checked. I’d like to host a cache but I’m not promising anything, and it will at least be half a year before I can start with that, I suspect.
I have a local cache running on a self-hosted S3 (minio) which is supplied by hydra with the current PR https://github.com/NixOS/hydra/pull/875
This supplies the realisations endpoint and seems to work pretty well for my ca-derivations.
Is my understanding correct that a public ca-derivation cache will not necessarily need a (trusted) signingKey, because the store paths are actually verifiable?
If I would supply a S3 cache for e.g. nixos-small, could other users benefit from this without inherently trusting me?
Issues · NixOS/nix · GitHub Note that I’ve started to write some issues for what is needed before CA derivations can be rolled out to hydra.nixos.org and then stabilized.
My basic litmus test is I don’t want to be put junk in the cache. If we “client” code doesn’t work ideally for whatever reason, that can be fixed relatively painlessly, but if junk gets uploaded, that will haunt us for longer.
The good thing is that its only a few things left before I am pretty happy with the cache format, and the precision of the build trace map claims being made therein.
Content-addressed Nix doesn’t remove the need for trust: you still need to trust who-ever is telling you ‘this output corresponds to this build specification’.
In traditional Nix, you could calculate the store path from the build specification yourself (no need for trust), and then you had to trust the cache that the output it sends you actually corresponds to that specification (by trusting/checking the signature).
With Content-addressed Nix, you no longer have to trust the cache that the output it sends you corresponds to the path you asked for: as it’s content-addressed, you can check it yourself. However, you can no longer calculate that store path yourself: you use the realisations endpoint for that. So instead of trusting the cache ‘itself’, you now have to trust the realisations endpoint (by trusting/checking the signatures on its responses).
In theory supplying your S3 cache might be useful for others, but we’d still need a way to share trusted(/signed?) realisations.
(this is mentioned in the RFC in rfcs/rfcs/0062-content-addressed-paths.md at 25c3f524631000b851375e7b96223a56e71cc0e2 · NixOS/rfcs · GitHub)
Is it possible to have an endpoint to hydra that can expose a content addressed store backed by an input addressed store? Assuming this is possible, using content addressed stores won’t require building the whole world for anyone testing out the feature.
depends - just finding an input addressed packages by its content hash is fairly trivial (a lot of existing data to hash to make the index though), but to make it a proper input addressed store, all references should be content addressed too.
I flesh out a lot of the details in my draft intensional store RFC, like taking the opportunity to put the store under /var/lib/nix and how to do it, but I lost momentum on it.
In any case, you could find and rewrite all references in the built files (masking self references while calculating the content hash) and that way end up with fully content-addressed packages.
This command may be of interest to you, though I personally don’t have experience with it: nix store make-content-addressed - Nix 2.29.1 Reference Manual
Well, kind of. It’s complicated. I have some info I can add to this based on my project laut, which is a signature format I would like to upstream.
It should be possible to do something like this with laut, though the relevant parts for that are not implemented yet. You can find the proposed design for that aspect here: make traces support IA derivations · Issue #5 · mschwaig/laut · GitHub
The difference between what I’m proposing and what that existing command does is that with my approach, since you record information at build time already my approach gives more information about dependencies recorded in the signatures, which leads substitution having better properties in terms of trust. Those are the same properties you also get from implementing CA itself in the specific way that is now also planned upstream.
The main problem with both make-content-addressed and using laut to feed a CA clients form an IA cache like that linked commend suggests, is that while we like to pretend that store paths are just opaque references to some store content, in practice lots of things actually start breaking if you start leaning on that wrong assumption, and you start rewriting the bits and bytes that make up those store paths on the consumer side.
In theory I think we could gradually change our package builds to where we start eliminating all of the ways in which they break when they are rewritten, so that assumption actually becomes true. I think this would be doable, gradually, but it would probably be a large effort.
I have been thinking about how this could be done a bit already and the main points I think could help us actually go through a transition like that, and which kind of went wrong in the first push for CA whenever we relied on make-content-addressed would be that we have to
meta attribute of the package in nixpkgs, and gradually expand that set,To me those are the lessons we can take away from that first push for CA.
The alternative such a gradual switch with stricter treatment of store paths and compatibility tools is to switch to ca “cold-turkey” in one go, building all the cache contents from scratch without any such rewriting.
PS: If you like neither rewriting nor CA you can also use laut simply to get the stricter verification of dependencies and trust relationships, but on top of an input-addressed store. I am not sure if any Nix implementation would want to implement that. I’m rooting for us finding some ways to make CA work. ![]()
What I am thinking wouldn’t touch any of the existing data in the nix store. with nix store make-content-addressed, the store, if I understand correctly, is rewritten. With making an interface, I would only want to make clients see that the store is content addressed. That
A. Wouldn’t be invasive
B. Could allow for swapping out algorithms in the case of bugs.
C. Would allow for people to use CA-derivarions with a lot less friction.
The only issue is the implementation difficulty, and, as stated, the reference rewriting. I’m not quite sure how to get around that second problem without being too onerous to the infrastructure. Maybe it’s possible to cache diffs, and save those and apply them on request?
That seems quite difficult, and maybe more processor intensive per request than is a good idea.
I think that if make-content-addressed doesn’t rewrite the store(idk if if it does or not), then just running that would also solve the same problem, albeit with more storage.
Laut is interesting! In my Intensional Store RFC there is the concept of a Trust DB, a per-user mapping of output hashes to a content-addressed result.
Because of the self-validation of CA, there is only one point in the system where you have to give trust regarding builds, and that’s when mapping your desired derivation to a build result. The Trust DB is this mapping.
The idea is that users can choose for themselves who to trust regarding build results, although in practice it would mostly be the sysadmin who manages Trust DB providers for the system.
Laut could be plugged into that.
I went into the deduplication optimization rabbit hole in Store Nix store in git · GitHub. The idea is to best-effort split references from files before storing them in git objects + the references. That way, git does the heavy dedupe lifting and we can put a FUSE in front to present the original files.
After testing CA derivations for a while, it was very exciting when I noticed that a build actually managed to short-circuit. Seeing 6 derivations will be built, followed by only 1 derivation being built along with 5 more resolved derivations made it all worth it.
For those interested in following the progress of ca-derivations stabilisation like me :