Just for a fun exercise I wanted to compute the hash /nix/store/<hash>-<name>.drv of a given derivation in bash/python. However from some reason this seems non trivial?
For simplicity how would this be done for the hello package at nixpkgs commit 1b33a0aedfa4ff65ff9241487b95267de78bf23d . To ensure that we are all talking about the same derivation.
$ nix-instantiate -I nixpkgs=https://github.com/NixOS/nixpkgs/archive/1b33a0aedfa4ff65ff9241487b95267de78bf23d.tar.gz '<nixpkgs>' -A hello
/nix/store/gmxsxf5gbipxwhl8s35jzqz2xgvkfpz2-hello-2.10.drv
How do I compute gmxsxf5gbipxwhl8s35jzqz2xgvkfpz2?
Bonus points if you can explain how the output path is computed … I wasn’t able to reproduce what was shown in the pill.
$ nix-build https://github.com/NixOS/nixpkgs/archive/1b33a0aedfa4ff65ff9241487b95267de78bf23d.tar.gz -A hello
/nix/store/234v87nsmj70i1592h713i6xidfkqyjw-hello-2.10
Not to plug my blog but I just wrote an article that reproduces those Nix pill examples in full detail and shows some more complicated examples too. See here. The Nix source code is hard to follow but Section 5 of Dolstra’s thesis is readable and mostly still correct.
It would be hard to reproduce fully reproduce those hello paths because the algorithms involve a very significant amount of recursion. You would have to compute the output and derivation paths of all of its dependencies…
Here’s the summary of how you instantiate a non-fixed-output derivation foo:
The output path is computed first. Start with the derivation (which is a data structure) after instantiating its dependencies recursively and substituting the output paths of the dependencies back into the derivation where needed. The output path in the derivation is currently empty but otherwise you now have exactly the .drv file that goes into the store. For each derivation path in the inputDrvs field, substitute its output descriptor with output path (see below) from its .drv file and hash the resulting data structure. This gives you the “output descriptor without output path” of foo. Note: If bar depends on foo, bar will calculate the “output descriptor” for foo similarly to how we computed it, except bar will have the foo output path filled in inside foo's derivation. That’s why I called it “output descriptor with output path.” This is because bar's output path should factor in the output path of foo, but foo can’t use its own output path as a factor when computing its own output path… This is hard to explain.
Append some metadata to this “output descriptor without output path,” hash the result, truncate the hash. This is the output path of foo.
Fill in the output path in the derivation. Now you have the .drv file. Hash this result to obtain the “derivation descriptor.”
Append some metadata to the “derivation descriptor”, hash the result, truncate the hash. This is the hash component of the store path of foo.drv.
If foo is a fixed-output derivation, the only difference is that its “output descriptor” (with and without output path) is computed from the known output hash instead of the derivation data structure. This means the output path only depends on the final output. Likewise for the output path of anything dependent on it.
Sorry if this is hard to follow. Like I said, I worked out some examples in extreme detail here.