Aside from @edolstra PhD thesis – is there a simple breakdown of the intensional model?
I’m trying to understand and disseminate it myself and it’s subtle all the ways in which it works.
The thesis is correct in the statement that content-addressing is easy to discuss but difficult to nail down.
Here are some rough points I gathered:
- We like intensional model because it allows us to safely share a Nix cache.
- We like the intensional model because maybe it allow us to early cutoff work if the output is the same
At the face of it, it is a HASH(d), where d is the derivation except this does not account for self references.
So instead it is really HASH(d’) where d’ is the derivation output with the self-referenced hashes cleared out (some extra point about having to store offsets to account for equivalent code)
A drv is now augmented to store a set of outputs (equivalences) for a particular derivation.
The Nix tool has to make sure that a single /nix/store does not contain more than 1 entry from an equivalence class, otherwise you run the risk of included both in the same closuer (i.e. glibc)
Questions:
- Rather than just reading an output path of dependencies and checking /nix/store when building a derivation, does a table need to be checked to see all the equivalences for a particular entry ?
- If that entry does not happen to exist, there’s no way to early cut-off ? How in practice, do we avoid many rebuilds when moving to Intensional model. The goal seems to be to avoid rebuilding but my read of the thesis is many users will rebuild most of their software since they won’t have the same output due to the non-determinism of most builds.
1 Like
There’s a wiki entry: Ca-derivations - NixOS Wiki
The blog post linked from there is really informative. Also the other one on self-references.
Your first question, yes, that seems to be correct. I’m not sure I understand your second question, but just wanted to point you to those resources in case you didn’t find them.
1 Like
Thanks for the links; the wiki itself was light on details and more on how to use it.
For the second question, the part I’m not 100% clear on is how does Nix (specifically Nixpkgs) deliver binary caches if you can’t know the complete set of possible HASH(d’) since most packages are not binary reproducible.
I don’t see how we get around from having to rebuild the world. The binary cache server can’t contain every possible store-entry.
Ah, right. As the beginning of section 6.4 puts it,
The main difference with the extensional model is that output paths are no longer known a priori. But because of this, we cannot prevent re-building a derivation by checking whether its output path is already valid. The same applies to checking for substitutes, which are also keyed on output paths.
Also, due to impurity, a single derivation can now result in several distinct components residing at different store paths, if the derivation is built multiple times (e.g., by different users). That is, a derivation actually defines an equivalence class of store paths within the Nix store, the members of such classes all having been produced by the same derivation.
It goes on to explain how the equivalence relation is defined, which I find pretty tough to comprehend, honestly. The “virtual” renamed outputs
field eqClass
is what we previously had as the store path, and still gets used for substitution and can enable the early cut-off you asked about.
Personally, I’m in more of a DIY headspace, so I don’t have “trust issues,” so to speak, or any need to use this feature. But it still is theoretically interesting to me.