Just for a fun exercise I wanted to compute the hash
/nix/store/<hash>-<name>.drv of a given derivation in bash/python. However from some reason this seems non trivial?
I have looked at NixOS - Nix Pills but was not able to recreate the results. Additionally I have looked at the c++ source code https://github.com/NixOS/nix/blob/f800d450b78091835ab7ca67847d76e75d877a24/src/libstore/derivations.cc#L350. These two look like they conflict on the proper way to compute a hash.
For simplicity how would this be done for the
hello package at nixpkgs commit
1b33a0aedfa4ff65ff9241487b95267de78bf23d . To ensure that we are all talking about the same derivation.
$ nix-instantiate -I nixpkgs=https://github.com/NixOS/nixpkgs/archive/1b33a0aedfa4ff65ff9241487b95267de78bf23d.tar.gz '<nixpkgs>' -A hello
How do I compute
Bonus points if you can explain how the output path is computed … I wasn’t able to reproduce what was shown in the pill.
$ nix-build https://github.com/NixOS/nixpkgs/archive/1b33a0aedfa4ff65ff9241487b95267de78bf23d.tar.gz -A hello
Not to plug my blog but I just wrote an article that reproduces those Nix pill examples in full detail and shows some more complicated examples too. See here. The Nix source code is hard to follow but Section 5 of Dolstra’s thesis is readable and mostly still correct.
It would be hard to reproduce fully reproduce those
hello paths because the algorithms involve a very significant amount of recursion. You would have to compute the output and derivation paths of all of its dependencies…
Here’s the summary of how you instantiate a non-fixed-output derivation
- The output path is computed first. Start with the derivation (which is a data structure) after instantiating its dependencies recursively and substituting the output paths of the dependencies back into the derivation where needed. The output path in the derivation is currently empty but otherwise you now have exactly the
.drv file that goes into the store. For each derivation path in the
inputDrvs field, substitute its output descriptor with output path (see below) from its
.drv file and hash the resulting data structure. This gives you the “output descriptor without output path” of
bar depends on
bar will calculate the “output descriptor” for
foo similarly to how we computed it, except
bar will have the
foo output path filled in inside
foo's derivation. That’s why I called it “output descriptor with output path.” This is because
bar's output path should factor in the output path of
foo can’t use its own output path as a factor when computing its own output path… This is hard to explain.
- Append some metadata to this “output descriptor without output path,” hash the result, truncate the hash. This is the output path of
- Fill in the output path in the derivation. Now you have the
.drv file. Hash this result to obtain the “derivation descriptor.”
- Append some metadata to the “derivation descriptor”, hash the result, truncate the hash. This is the hash component of the store path of
foo is a fixed-output derivation, the only difference is that its “output descriptor” (with and without output path) is computed from the known output hash instead of the derivation data structure. This means the output path only depends on the final output. Likewise for the output path of anything dependent on it.
Sorry if this is hard to follow. Like I said, I worked out some examples in extreme detail here.
@dunnl your blog link is no longer active. Do you have the content available somewhere else?
Here’s an archived version of it: https://archive.ph/kBCtX
Thanks @aos! I did check Wayback Machine but wasn’t there.
The blog post is fantastic.