Why don't nix hashes use base-16?

Surprisingly enough, I couldn’t find an answer to this question from googling…

Is there a reason that nix hash attributes don’t support base 16? Almost everything else seems to default to base 16, making it more difficult to compare a nix hash to anything else and forcing you to use nix-hash (rather than shasum or something). It seems like this would help make things more user-friendly, and be a bit more secure, as you could compare directly if someone publishes a sha256 for a download.

It also seems like supporting base-16 would be backwards compatible, since the length could be used to determine the base of the hash (given that things are labeled sha256). In fact, it looks like nix did previously support base-16, but I assume it got removed somewhere along the way.

2 Likes

nix is able to consume it fine, along with base-32, base-64, and sri sha256 hashes

it is

Nope, only what it prints out has changed, it still supports all encodings. For 2.3.x, it will be a 32bit encoded sha256. For unstable (2.4pre), it will be a 64bit encoded sha256 sri hash.

For example, the python updater script will just use the base-16 sha256 from pypi to determine the sdist or wheel hash when pulling from pypi.org, since this is exposed in the api.

$ nix hash  --help
Usage: nix hash COMMAND FLAGS... ARGS...

Common flags:

Available commands:
  file       print cryptographic hash of a regular file
  path       print cryptographic hash of the NAR serialisation of a path
  to-base16  convert a hash to base-16 representation
  to-base32  convert a hash to base-32 representation
  to-base64  convert a hash to base-64 representation
  to-sri     convert a hash to SRI representation

For context as to why this is, I’ll try to mention the white paper. Disclaimer, I’m going off of memory, so this should be roughly accurate but might not get all the details correct.

The reason for different encodings is that they were concerned about path length when using a sha as part of the nix store object. So a 32 bit base was used to make it more succinct while still able to make valid paths. Circa 2003-2005, It originally used md5 which had 32 characters (this is also why nix-hash defaults to md5) which set the initial nix-store path template (e.g. /nix/store/<hash>-<drv-name>). md5 was phased out for collision concerns. sha1 was considered but there was also collision concerns. sha256 was chosen as it was still (and still is) considered a “secure” cryptographic hash. To retain backwards compatibility, the 52 character long 32bit encoded hash is just truncated to the first 32 characters as part of the store path.

4 Likes

@jonringer How are hashes in /nix/store paths actually derived? I can’t find the details. My understanding that the hash is the result of hashing the result of the build itself, but now I’m not sure:

 > nix-build --no-out-link -A pkgs.hello
/nix/store/nm7d7i9jlqf25nwvan7ghlv3jafnbryj-hello-2.10

 > nix-hash --truncate --type sha256 /nix/store/nm7d7i9jlqf25nwvan7ghlv3jafnbryj-hello-2.10
7dd09f4a6691919e14f186bf159049d56bd15261

How is the hash in the store path generated?

1 Like

I think you are missing --base32 flag

I tried that too, doesn’t look right:

 > nix-hash --truncate --base32 --type sha256 /nix/store/r9jscyj753pawf5qfpjjq9avyj0qf6qb-status-go-develop-fc56ce6-android.drv 
lx0iakanrydyv90xs5xi6n4pava8lv5i

It’s still missing something.

That’s probably because that’s the hash of the content, not of the inputs.

Yeah, and that doesn’t work either.

 > nix-build --no-out-link /nix/store/r9jscyj753pawf5qfpjjq9avyj0qf6qb-status-go-develop-fc56ce6-android.drv                   
/nix/store/p2nqk2h2s33p9g6ca6f2f5awqa0r4wa5-status-go-develop-fc56ce6-android

 > nix-hash --truncate --base32 --type sha256 /nix/store/r9jscyj753pawf5qfpjjq9avyj0qf6qb-status-go-develop-fc56ce6-android.drv
lx0iakanrydyv90xs5xi6n4pava8lv5i

 > nix-hash --truncate --base32 --type sha256 /nix/store/p2nqk2h2s33p9g6ca6f2f5awqa0r4wa5-status-go-develop-fc56ce6-android    
v85ypwk5zkfsxvyihbf5mr73pvp7l12c

I’m trying to look at nix-build source code to find how the hash is derived but there’s a lot of stuff there.

That’s again content hash.

You can ignore the content. Just look at the hashes of the input. But not of their content, but that one is as well built from the hashes of their inputs, etc.

If the package hash is created from the derivation inputs, then what is the hash for the derivation created from?

Also, how are the inputs hashed?

1 Like

I won’t pretend to be read up on this, but the thesis addresses it in 5.2.2. Store paths (bottom of linked page; numbered page 92 and PDF page 100).

1 Like

All derivations have their store path computed during evaluation, even Fixed Output Derivations (FODs). This also helps to capture if the builder (script) used to fetch the resources changed as well.

$ nix show-derivation $(nix-instantiate -A hello.src) | head -10
warning: you did not specify '--add-root'; the result might be removed by the garbage collector
{
  "/nix/store/gbk50vz006igkvmp1cs5g1bspp36qf94-hello-2.10.tar.gz.drv": {
    "outputs": {
      "out": {
        "path": "/nix/store/3x7dwzq014bblazs7kq20p9hyzz0qh8g-hello-2.10.tar.gz",
        "hashAlgo": "sha256",
        "hash": "31e066137a962676e89f69d1b65382de95a7ef7d914b8cb956f41ea72e0f516b"
      }
    },
    "inputSrcs": 

Also, I believe the hash is computed off of the nar archive. However I may be wrong on this.

The store path is essentially part of the merkle tree hash which comprises a derivation and all of it’s dependencies.

The hashes for output paths are based on the derivations, not the output path [1]. This is why Nix can compute the output paths without building the derivation.

However, you cannot directly compute the hash. Consider the hello derivation, its drv has to contain the output path (e.g. to specify the install path). So, this is circular: in order to compute the output path, we need to hash the derivation, but the derivation typically contains the hash of the output path.

Nix works around this circularity by using empty string stubs for the output paths when hashing the derivation. Then after the output path is computed, the stubs are replaced by the actual output paths. You can try this yourself (see below).

However, to simplify things, it’s easier to use a derivation which does not have dependencies on other derivations (which complicates things a bit):

# Create a builder that only creates the output path
$ echo '#!/bin/sh\necho "" > $out' > trivial-builder.sh

# Create a simple derivation and find its path
$ nix-instantiate -E 'derivation { system = "x86_64-linux"; builder = ./trivial-builder.sh; name = "test"; }'
/nix/store/b2gfmr42im72mhn3f3kn6kjmcqbmrz54-test.drv

# Output path for the derivation
$ nix-build -E 'derivation { system = "x86_64-linux"; builder = ./trivial-builder.sh; name = "test"; }' --no-out-link
/nix/store/al2akmg39ablfv4gn849q7shh0gscns2-test

# Remove the output path in the derivation, this emulates how Nix initially sees
# the derivation before the output path is known... and compute the hash
$ sed -s 's,/nix/store/al2akmg39ablfv4gn849q7shh0gscns2-test,,g' /nix/store/b2gfmr42im72mhn3f3kn6kjmcqbmrz54-test.drv | sha256sum -
e93875b671546b2067fb430b4cc41a4d466ab9cf012f83e9157b4aadb274398f  -

# Nix does not directly use the derivation hash, but adds some metadata to
# indicate the type of derivation, the store path, and the name. This avoids
# generating the same store paths for different outputs or if a different store
# path is used. The derivation content hash and the metadata is hashed
# again.
$ nix-hash --type sha256 --truncate --flat --base32 <(echo -n "output:out:sha256:e93875b671546b2067fb430b4cc41a4d466ab9cf012f83e9157b4aadb274398f:/nix/store:test")
al2akmg39ablfv4gn849q7shh0gscns2

Note that the final hash is the hash used in the store path. The format for the ‘metadata’ can be found in the Nix source:

[1] The exeption here are fixed-output derivations, for which the hash of the contents of the output path is used. Also, content-adressed derivations (which are currently being worked on) use a hash of the output (with self-references replaced).

Edit: forgot to answer this question:

From the drv file, plus a specification of the store paths it references:

# First we hash the drv. This is the final drv including the output
# path(s).
$ sha256sum /nix/store/b2gfmr42im72mhn3f3kn6kjmcqbmrz54-test.drv
7dd34812cf6f05e836d8b029639416ba53e9e88cc89a7b9267b449c0a2767f6c  /nix/store/b2gfmr42im72mhn3f3kn6kjmcqbmrz54-test.drv

# drv files are added to the store as text files. So, we use that type in the
# metadata, along with store paths it references (the builder script).
$ nix-hash --type sha256 --truncate --flat --base32 <(echo -n "text:/nix/store/rrcv9www9dg9s7lwypbnvff0b3hr1634-trivial-builder.sh:sha256:7dd34812cf6f05e836d8b029639416ba53e9e88cc89a7b9267b449c0a2767f6c:/nix/store:test.drv")
b2gfmr42im72mhn3f3kn6kjmcqbmrz54

And that’s the hash that is part of the store path. It’s interesting that the referenced store paths have to be included (since they are also in the drv itself). I guess I’d have to reread the relevant parts of Eelco’s thesis to understand.

5 Likes

The builder does not influence the hash of the output path of FODs. FODs are only dependent on the contents of the output path:

# FOD to retrieve a GPG public key.
$ nix-instantiate -E 'with import <nixpkgs> {}; fetchurl { url = "https://danieldk.eu/danieldk.asc"; sha256 = "0frh2calgh590sca7ypdxh7mdcfvq09vabp6b9k4viyx8gb1qnjp"; }'
/nix/store/gmibvgq5qh12rr5xf49icbl4ai5ayd9j-danieldk.asc.drv

# Note the output path.
$ nix-build -E 'with import <nixpkgs> {}; fetchurl { url = "https://danieldk.eu/danieldk.asc"; sha256 = "0frh2calgh590sca7ypdxh7mdcfvq09vabp6b9k4viyx8gb1qnjp"; }'
/nix/store/0bzx6r0d0dkh5xa2d5sfpgqhq35g8cn2-danieldk.asc

# Get the content hash
$ sha256sum /nix/store/0bzx6r0d0dkh5xa2d5sfpgqhq35g8cn2-danieldk.asc
575a1cd643ddc74d665ae62eb513c0dbb1560fecedfaa39806a9c0471513303b  /nix/store/0bzx6r0d0dkh5xa2d5sfpgqhq35g8cn2-danieldk.asc

# Hash hash metadata ;)
$ echo -n "fixed:out:sha256:575a1cd643ddc74d665ae62eb513c0dbb1560fecedfaa39806a9c0471513303b:"  | sha256sum -
5902100ebdadc348adc7760acbf8dc39f59db5d12123be04d23e3cf53d8ca084  -

# Compute hash for the output path
$ nix-hash --type sha256 --truncate --base32 --flat <(echo -n "output:out:sha256:5902100ebdadc348adc7760acbf8dc39f59db5d12123be04d23e3cf53d8ca084:/nix/store:danieldk.asc")
0bzx6r0d0dkh5xa2d5sfpgqhq35g8cn2

The hash of the output path is computed purely based on the contents of the derivation.

Some fetchers, such as fetchCargoTarball abuse FODs though. FOD builds have network access. fetchCargoTarball uses that to do vendor dependencies with cargo. This is problematic, because the output becomes dependent on how Cargo (and its dependencies) behave. It is not a proper FOD, because the definition of a FOD is that its output is independent of anything else.

5 Likes

Wow, thanks for this detailed explanation @danieldk . I got a question about this on a Nix Fundamentals Video I did some time ago and didn’t know how to answer. This is great.

I guess the path can’t be included in the non-fixed output derivation because $out has to be already set at build time in order to be able to do things like absolute symlinks or absolute paths within the files in build result as well. Makes sense. Much appreciated!

2 Likes

Your video is one of the best!!! It’s in my top 10, even my top 5 :-).

1 Like

Thanks man, I was hoping to cover all the basics so Nix doesn’t look as a scary magical black box.

I have some issues with it, the audio and video isn’t the best at some points. Maybe I’ll re-do it eventually with better mic and video quality.

1 Like

Just curious: what was the reason to make sri format default in the next Nix ?
[+] more succinct (questionable, assuming constant sha256- prefix and = suffix)
[+] compatible with HTML’s identity attribute (but what for?)
[-] never published on websites next to tarballs, there are normally base-16
[-] contains + and / and thus not form a bound word it the eyes on text editors and terminals. It will be difficult to copy to the clipboard the valid hash from the Nix output. For base-32 and base-16 it is just double-click on the word in terminal, for sri… one have to possess a sniper gaming mouse.

2 Likes

I’ve been caught out a few times with hash representation formats and hash types…

Maybe one day nix will support ipfs multihashes…which are rather nice way of a self describing hash , it’s type, it’s length…, so you can derive the hash type from just from the data… :-))))))))))))

2 Likes

For me, using konsole it is just double clicking… Though indeed, “navigating” through the Token in my editor is a pain, though it would be for any hash that uses uppercase letters, as I use camelcase word boundaries…

@Ericson2314 promised to add Git tree hashes and some other types which I forgot.
So sha256 = "xxxxxxxxxx";hash = "sha256:xxxxxxxxxx" transition and accepting many hashes look meaningful.

1 Like