Why do fetched tarballs with different hash algos produce different derivations and outpaths?

This one’s a real mystery to me.

The derivations I’ve written below are exact copies of the builtins.fetchurl derivation, except that they aren’t tied to the tarball cache ( very useful ) and builtins.fetchurl will always use sha256 ( the sha256 outpath below matches builtins.fetchurl exactly ):

nix-repl> csurl = "https://registry.npmjs.org/concat-stream/-/concat-stream-1.6.2.tgz"
nix-repl> cshash512 = "sha512-27HBghJxjiZtIk3Ycvn/4kbJk/1uZuJFfuPEns6LaEvpvG1f0hTea8lilrouyo9mVc2GWdcEZ8OLoGmSADlrCw=="
nix-repl> cshash256 = "sha256-ykfBWZeLyCbZZN/kWSPcYYlFUcxUfJ8ib5X8kNR8Q/E="
nix-repl> doFetch = outputHashAlgo: outputHash: derivation { builder = "builtin:fetchurl"; name = baseNameOf csurl; url = csurl; urls = [csurl]; unpack = false; system = "builtin"; executable = false; preferLocalBuild = true; outputHashMode = "flat"; inherit outputHashAlgo outputHash; impureEnvVars = ["http_proxy" "https_proxy" "ftp_proxy" "all_proxy" "no_proxy"]; }
nix-repl> tb512 = doFetch "sha512" cshash512
nix-repl> tb512
<<derivation /nix/store/5431fv4nn817...-concat-stream-1.6.2.tgz.drv>>

nix-repl> :b tb512
This derivation produced the following outputs:
  out -> /nix/store/spfsxrv9...-concat-stream-1.6.2.tgz

nix-repl> tb256 = doFetch "sha256" cshash256
nix-repl> tb256
<<derivation /nix/store/6an8x4djfiq1d...-concat-stream-1.6.2.tgz.drv>>

nix-repl> tb256
This derivation produced the following outputs:
  out -> /nix/store/p4af7vf0wmll...-concat-stream-1.6.2.tgz

Okay, so the derivations not matching totally makes sense. The sha512 and sha256 attributes are different, so it makes sense that they would differ.

What I can’t explain, is why the output paths are different, considering that these outputs should be CA.
I went so far as to actually explicitly make them CA, remove the outputHash attributes and add __contentAddressed just to make sure I wasn’t really misunderstanding how like … Nix works lol. And the output paths and derivations match their counterparts with same outputHashAlgo - this revealed something interesting about how CA works that I was oblivious to before.

So, my question: why is Nix producing different output paths when different outputHashAlgo attributes are used, considering these should be CA? Is the issue that we can’t have a “many to one” mapping for outpaths? That would help explain why sha256 and only sha256 is heavily encouraged by the builtins; buut idk, kind of seems like you could treat outputHashAlgo and outputHash as special attrs ( similar to meta maybe? ) that don’t effect the outPath.

For drvPath, I think I “get it”. I can prefetch using a sha512 with derivation { builder = "builtin:fetchurl"; ...; } and then “downgrade” it to sha256 using builtins.hashFilederivation { ...; } again if I really want the derivations to align with builtins.fetchurl - but that feels incredibly goofy.


In any case, if anyone can confirm my theory about the “many to one drvs to outputs” thing that would be awesome.

If my thinking is right, this little example will have taught me something pretty important about optimizing fetchers : don’t mix hash algos in a large set of packages because you’ll “wrongly” trigger rebuilds.

1 Like

I’m not super familiar with this aspect of nix, but I think outputHashAlgo determines what hash algorithm is used for CA name creation. It isn’t locked to one algorithm.

I’m not sure I follow the “many to one drvs to outputs” theory, as it’s blatantly obvious that with CA, many distinct derivations can produce the same result contents, and thus hash… just add an extra echo foo to the build script, for example.

Do you have CA enabled? Otherwise derivations are input addressed and the hashes are calculated from the inputs.

I ran this example with CA off, and again with CA on. In the second test I also added __contentAddressed explicitly.

In both sets of tests the same derivations and outpaths were produced - which seems wrong to me especially for the CA cases.

builtin:fetchurl returns a so-called “fixed-output derivation”, which is a type of content-addressed derivation where the hashes are specified in advance. The output paths are not computed from the output (which is not known in advance), but from the outputHashAlgo, outputHash and name attributes. Nix has no way to way to know that e.g. a SHA-256 and SHA-512 hash produce the same output.

Content-address derivations also respect the outputHashAlgo attribute, so two derivations that produce the same output but have different outputHashAlgo values will produce a different store path.

2 Likes

Thanks for the clarity.

The behavior makes sense once I sit with it a bit; it just wasn’t what I expected.

Also: we gotta cook up some more of these builtin: builders - these are dope!