Why don't nix hashes use base-16?

If the package hash is created from the derivation inputs, then what is the hash for the derivation created from?

Also, how are the inputs hashed?

1 Like

I won’t pretend to be read up on this, but the thesis addresses it in 5.2.2. Store paths (bottom of linked page; numbered page 92 and PDF page 100).

1 Like

All derivations have their store path computed during evaluation, even Fixed Output Derivations (FODs). This also helps to capture if the builder (script) used to fetch the resources changed as well.

$ nix show-derivation $(nix-instantiate -A hello.src) | head -10
warning: you did not specify '--add-root'; the result might be removed by the garbage collector
{
  "/nix/store/gbk50vz006igkvmp1cs5g1bspp36qf94-hello-2.10.tar.gz.drv": {
    "outputs": {
      "out": {
        "path": "/nix/store/3x7dwzq014bblazs7kq20p9hyzz0qh8g-hello-2.10.tar.gz",
        "hashAlgo": "sha256",
        "hash": "31e066137a962676e89f69d1b65382de95a7ef7d914b8cb956f41ea72e0f516b"
      }
    },
    "inputSrcs": 

Also, I believe the hash is computed off of the nar archive. However I may be wrong on this.

The store path is essentially part of the merkle tree hash which comprises a derivation and all of it’s dependencies.

The hashes for output paths are based on the derivations, not the output path [1]. This is why Nix can compute the output paths without building the derivation.

However, you cannot directly compute the hash. Consider the hello derivation, its drv has to contain the output path (e.g. to specify the install path). So, this is circular: in order to compute the output path, we need to hash the derivation, but the derivation typically contains the hash of the output path.

Nix works around this circularity by using empty string stubs for the output paths when hashing the derivation. Then after the output path is computed, the stubs are replaced by the actual output paths. You can try this yourself (see below).

However, to simplify things, it’s easier to use a derivation which does not have dependencies on other derivations (which complicates things a bit):

# Create a builder that only creates the output path
$ echo '#!/bin/sh\necho "" > $out' > trivial-builder.sh

# Create a simple derivation and find its path
$ nix-instantiate -E 'derivation { system = "x86_64-linux"; builder = ./trivial-builder.sh; name = "test"; }'
/nix/store/b2gfmr42im72mhn3f3kn6kjmcqbmrz54-test.drv

# Output path for the derivation
$ nix-build -E 'derivation { system = "x86_64-linux"; builder = ./trivial-builder.sh; name = "test"; }' --no-out-link
/nix/store/al2akmg39ablfv4gn849q7shh0gscns2-test

# Remove the output path in the derivation, this emulates how Nix initially sees
# the derivation before the output path is known... and compute the hash
$ sed -s 's,/nix/store/al2akmg39ablfv4gn849q7shh0gscns2-test,,g' /nix/store/b2gfmr42im72mhn3f3kn6kjmcqbmrz54-test.drv | sha256sum -
e93875b671546b2067fb430b4cc41a4d466ab9cf012f83e9157b4aadb274398f  -

# Nix does not directly use the derivation hash, but adds some metadata to
# indicate the type of derivation, the store path, and the name. This avoids
# generating the same store paths for different outputs or if a different store
# path is used. The derivation content hash and the metadata is hashed
# again.
$ nix-hash --type sha256 --truncate --flat --base32 <(echo -n "output:out:sha256:e93875b671546b2067fb430b4cc41a4d466ab9cf012f83e9157b4aadb274398f:/nix/store:test")
al2akmg39ablfv4gn849q7shh0gscns2

Note that the final hash is the hash used in the store path. The format for the ‘metadata’ can be found in the Nix source:

[1] The exeption here are fixed-output derivations, for which the hash of the contents of the output path is used. Also, content-adressed derivations (which are currently being worked on) use a hash of the output (with self-references replaced).

Edit: forgot to answer this question:

From the drv file, plus a specification of the store paths it references:

# First we hash the drv. This is the final drv including the output
# path(s).
$ sha256sum /nix/store/b2gfmr42im72mhn3f3kn6kjmcqbmrz54-test.drv
7dd34812cf6f05e836d8b029639416ba53e9e88cc89a7b9267b449c0a2767f6c  /nix/store/b2gfmr42im72mhn3f3kn6kjmcqbmrz54-test.drv

# drv files are added to the store as text files. So, we use that type in the
# metadata, along with store paths it references (the builder script).
$ nix-hash --type sha256 --truncate --flat --base32 <(echo -n "text:/nix/store/rrcv9www9dg9s7lwypbnvff0b3hr1634-trivial-builder.sh:sha256:7dd34812cf6f05e836d8b029639416ba53e9e88cc89a7b9267b449c0a2767f6c:/nix/store:test.drv")
b2gfmr42im72mhn3f3kn6kjmcqbmrz54

And that’s the hash that is part of the store path. It’s interesting that the referenced store paths have to be included (since they are also in the drv itself). I guess I’d have to reread the relevant parts of Eelco’s thesis to understand.

5 Likes

The builder does not influence the hash of the output path of FODs. FODs are only dependent on the contents of the output path:

# FOD to retrieve a GPG public key.
$ nix-instantiate -E 'with import <nixpkgs> {}; fetchurl { url = "https://danieldk.eu/danieldk.asc"; sha256 = "0frh2calgh590sca7ypdxh7mdcfvq09vabp6b9k4viyx8gb1qnjp"; }'
/nix/store/gmibvgq5qh12rr5xf49icbl4ai5ayd9j-danieldk.asc.drv

# Note the output path.
$ nix-build -E 'with import <nixpkgs> {}; fetchurl { url = "https://danieldk.eu/danieldk.asc"; sha256 = "0frh2calgh590sca7ypdxh7mdcfvq09vabp6b9k4viyx8gb1qnjp"; }'
/nix/store/0bzx6r0d0dkh5xa2d5sfpgqhq35g8cn2-danieldk.asc

# Get the content hash
$ sha256sum /nix/store/0bzx6r0d0dkh5xa2d5sfpgqhq35g8cn2-danieldk.asc
575a1cd643ddc74d665ae62eb513c0dbb1560fecedfaa39806a9c0471513303b  /nix/store/0bzx6r0d0dkh5xa2d5sfpgqhq35g8cn2-danieldk.asc

# Hash hash metadata ;)
$ echo -n "fixed:out:sha256:575a1cd643ddc74d665ae62eb513c0dbb1560fecedfaa39806a9c0471513303b:"  | sha256sum -
5902100ebdadc348adc7760acbf8dc39f59db5d12123be04d23e3cf53d8ca084  -

# Compute hash for the output path
$ nix-hash --type sha256 --truncate --base32 --flat <(echo -n "output:out:sha256:5902100ebdadc348adc7760acbf8dc39f59db5d12123be04d23e3cf53d8ca084:/nix/store:danieldk.asc")
0bzx6r0d0dkh5xa2d5sfpgqhq35g8cn2

The hash of the output path is computed purely based on the contents of the derivation.

Some fetchers, such as fetchCargoTarball abuse FODs though. FOD builds have network access. fetchCargoTarball uses that to do vendor dependencies with cargo. This is problematic, because the output becomes dependent on how Cargo (and its dependencies) behave. It is not a proper FOD, because the definition of a FOD is that its output is independent of anything else.

5 Likes

Wow, thanks for this detailed explanation @danieldk . I got a question about this on a Nix Fundamentals Video I did some time ago and didn’t know how to answer. This is great.

I guess the path can’t be included in the non-fixed output derivation because $out has to be already set at build time in order to be able to do things like absolute symlinks or absolute paths within the files in build result as well. Makes sense. Much appreciated!

2 Likes

Your video is one of the best!!! It’s in my top 10, even my top 5 :-).

1 Like

Thanks man, I was hoping to cover all the basics so Nix doesn’t look as a scary magical black box.

I have some issues with it, the audio and video isn’t the best at some points. Maybe I’ll re-do it eventually with better mic and video quality.

1 Like

Just curious: what was the reason to make sri format default in the next Nix ?
[+] more succinct (questionable, assuming constant sha256- prefix and = suffix)
[+] compatible with HTML’s identity attribute (but what for?)
[-] never published on websites next to tarballs, there are normally base-16
[-] contains + and / and thus not form a bound word it the eyes on text editors and terminals. It will be difficult to copy to the clipboard the valid hash from the Nix output. For base-32 and base-16 it is just double-click on the word in terminal, for sri… one have to possess a sniper gaming mouse.

2 Likes

I’ve been caught out a few times with hash representation formats and hash types…

Maybe one day nix will support ipfs multihashes…which are rather nice way of a self describing hash , it’s type, it’s length…, so you can derive the hash type from just from the data… :-))))))))))))

2 Likes

For me, using konsole it is just double clicking… Though indeed, “navigating” through the Token in my editor is a pain, though it would be for any hash that uses uppercase letters, as I use camelcase word boundaries…

@Ericson2314 promised to add Git tree hashes and some other types which I forgot.
So sha256 = "xxxxxxxxxx";hash = "sha256:xxxxxxxxxx" transition and accepting many hashes look meaningful.

1 Like

:slight_smile: It’s implemented, I just need to convince people to merge it (and probably fix some conflicts in the meantime).

I have no idea. I have just observed this trend when first reviewing PRs, then later when I made the jump to nixUnstable for flake support.

I view FODs as more “defined as their own content”; I would say this is okay as long as it’s repeatable. With a Cargo.lock, you have a good chance of being able to pull down the same content. You’re also not dependent on how cargo resolves dependencies, as resolution was done during the lock creation.

I have also ran into the issue of needing special ssh access to some resources, and really the only way to do this is with FODs which can use your user credentials.

Also, one of the issues with naersk (a buildRustPackage alternative) is that it doesn’t fully emulate “cargo behavior” because it can’t recursively pull git+ssh sources as noted in this issue. However, I guess you could argue that fetchCargoTarball would also be subject to change if the git+ssh urI isn’t stable.

1 Like

I understand your point of view, but all hashes have been broken in the past because of changes in cargo vendor, and that should not happen for a fixed-output derivation. Fetching of individual crate archives could be FODs, because they never change and we even have usable hashes in the lock files (you could fetchurl them). Setting up a vendored output path, especially when done by a program like cargo vendor (which can change) should IMO be a non-FOD derivation.

It would be possible to build crates without violating FODs. AFAIK fromTOML was added for such cases. We could use fromTOML to read a lock file, fetchurl or fetchcrate the dependencies (with the provided hashes) and then set up the vendoring using a regular derivation. This also avoids the issue where all cargoSha256 need to be fixed everywhere if some aspect of the vendoring changes. Unfortunately, for nixpkgs it would also mean that would we have to add the Cargo.lock files for individual packages (since we can’t use IFD).

I think buildRustPackage as it is, is the best we have now. I prefer the philosophy of buildRustCrate, but since it’s a our own reimplementation of Cargo in Nix + shell, it does not cover all edge cases. Also, it is not clear if adding large amounts of automatically generated Nix expressions to nixpkgs is the way to go. buildRustPackage abuses the notion of FODs, but I don’t think we have anything better if we don’t want to add lock files to nixpkgs.

2 Likes

[+] makes it easier to switch to other default hash algorithms in the future.

2 Likes

You’re right. But it’s rare for it to cause issues.

The same could be said about fetchpatch where patchutils has to be pinned because it affects how the patches are normalized. So, use of fetchpatch is also not a “true FOD” in this regard either.

I’m also not a big fan of making FODs super granular as in the case of node packages. Doing PRs which include node changes are painful, as you have to make changes to the list, and then wait a very long time for the node-packages.nix to be generated. If another PR which affects the same file gets merged, then your PRs will get merge conflicts, and then you have to regenerate the file for the “chance” for it to be merged. Not to mention that these “pinned lists” take up a significant amount of space within the repository:

$ find ~/.nix-defexpr/channels/nixos/ -type f -exec du -h {} \; | sort -rh | head -10
11M	/home/jon//.nix-defexpr/channels/nixos/pkgs/development/haskell-modules/hackage-packages.nix
3.8M	/home/jon//.nix-defexpr/channels/nixos/programs.sqlite
3.8M	/home/jon//.nix-defexpr/channels/nixos/pkgs/development/node-packages/node-packages.nix
2.6M	/home/jon//.nix-defexpr/channels/nixos/pkgs/development/r-modules/cran-packages.nix
2.6M	/home/jon//.nix-defexpr/channels/nixos/pkgs/applications/editors/emacs-modes/recipes-archive-melpa.json
1.9M	/home/jon//.nix-defexpr/channels/nixos/pkgs/tools/typesetting/tex/texlive/pkgs.nix
980K	/home/jon//.nix-defexpr/channels/nixos/pkgs/top-level/all-packages.nix
816K	/home/jon//.nix-defexpr/channels/nixos/pkgs/top-level/perl-packages.nix
556K	/home/jon//.nix-defexpr/channels/nixos/pkgs/development/compilers/elm/packages/node-packages.nix
528K	/home/jon//.nix-defexpr/channels/nixos/pkgs/applications/version-management/gitlab/yarnPkgs.nix

Which is part of the significant bloat in nixpkgs:

[10:59:21] jon@jon-desktop /home/jon/projects/nixpkgs (master)
$ du -hcd0 ./* | tail -1
214M	total
[10:59:25] jon@jon-desktop /home/jon/projects/nixpkgs (master)
$ git co HEAD~50000 # Around Nov 2018
Note: switching to 'HEAD~50000'.
...
[11:02:07] jon@jon-desktop /home/jon/projects/nixpkgs ((058a96dc0e4...))
$ du -hcd0 ./* | tail -1
151M	total

I guess you could argue that the lines-of-code per package is low in these generated files. However, it does come at a significant space cost.

Maybe in the future, we could have package ecosystems just defined through flakes in nixpkgs, and curation of those ecosystems can be done in other repositories. It would be harder to test “how does this python code change affect the documentation generation of this haskell package”, but it would have some other benefits. For example, vim and emacs packages could be automatically updated, tested, and merged. Then updating the packages on nixpkgs would just be updating the flake.lock.

{
  inputs.vimPlugins.url = "github:nixos/vim-plugins";
  inputs.vimPlugins.nixpkgs.follows = self; # not sure if this would work

  ...

However, I do really like that nixpkgs is a monolith. Being able to query how your changes affects every possible package is a huge strength from a testing perspective.

3 Likes

In case anyone wants to try, I have made a PR which implements this based on @edolstra’s import-cargo. While it may not be fit for nixpkgs, since it requires that Cargo.lock is available, you can use it to apply buildRustPackage in your own Rust projects without needing to specify cargoSha256.

Example from the unit tests:

{ rustPlatform }:

rustPlatform.buildRustPackage {
  pname = "basic";
  version = "0.1.0";

  src = ./.;

  cargoLock = {
    lockFile = ./Cargo.lock;
  };

  doInstallCheck = true;

  installCheckPhase = ''
    $out/bin/basic
  '';
}
2 Likes

Linking this issue, which seems relevant.

1 Like