Unexpected caching

I’m currently playing around with some code to extract a tar file and do some work on it after extraction to clean it up. So I have built a derivation that does what I require. It seems that the derivation is only rebuilt if the inputs are changed, but it doesn’t include the buildScript as an input.

This means that if I have a generic builder, then it’s possible to build two different outputs that have the same hash:

with (import <nixpkgs> {}); 
rec {
  pullImageBase = var: {imageName, ref, sha256}:
    stdenv.mkDerivation {
      name = "docker-image_sha256_${ref}";

      outputHashAlgo = "sha256";
      outputHashMode = "recursive";
      outputHash = sha256;

      buildInput = [ skopeo ];

      tmp = "./tmp_dir";

      buildCommand = ''
      mkdir $tmp
      mkdir $out
      ${skopeo}/bin/skopeo copy --src-tls-verify=false docker://${imageName}@sha256:${ref} dir:$tmp
      files=`ls $tmp/*.tar`
      for file in $files;
      do
        SHA256=`basename $file | cut -d'.' -f1`
        echo $SHA256
        mkdir $out/$SHA256
        mv $tmp/$SHA256.tar $out/$SHA256/layer.tar
        echo "1.0" > $out/$SHA256/VERSION
        echo "{\"id\":\"$SHA256\"}" > $out/$SHA256/json
      done
      cp $tmp/manifest.json $out/${var}
      '';
    };
  pullImage = pullImageBase "first.json";
  pullImage2 = pullImageBase "second.json";

  foo = pullImage {
    imageName = "library/alpine";
    ref = "7df6db5aa61ae9480f52f0b3a06a140ab98d427f86d8d5de0bedab9b8df6b1c0";
    sha256 = "070kc10wxi332na7p9mgwr18a6wsz5nzx2krhgz7b2548d7fk3v3";
  };
  bar = pullImage2 {
    imageName = "library/alpine";
    ref = "7df6db5aa61ae9480f52f0b3a06a140ab98d427f86d8d5de0bedab9b8df6b1c0";
    sha256 = "070kc10wxi332na7p9mgwr18a6wsz5nzx2krhgz7b2548d7fk3v3";
  };
}

Evaluating the foo & bar attributes result in the same derivation. I feel like this is unintended behaviour. I concede that I may be missing something though, so input/feedback would be greatly appreciated.

Apologies for the non-minimal example.

This is a very common mistake, you are not alone !

How to fix

Try changing the hashes to something else and re-build, nix will re-run both fetchers and give you the correct (different) hashes.

Why

When the outputHash is set, the derivation becomes a “fetcher” derivation. In that case, nix will not compute the build input and instead only look in /nix/store to see if a derivation with the same hash already exists. Since for example foo was already loaded, bar will become a noop.

This is a common mistake that can lead to confusion when for example the package version is bumped but the derivation is still returning the old package.

The best way to prevent that issue that I have found is to develop a habit of also touching one character in the fetcher sha256 to force the re-fetching of the source.

1 Like

Thanks @zimbatm, this definitely answers my question. I ended up having a long discussion with another NixOs user where I figured this out. Unfortunately I couldn’t update the question as it was under ‘moderation’.

I found that information on this distinction isn’t that easy to come by! I had even managed to package several applications beforehand without coming across this issue. After spending a few days thinking about this issue, I thought that the results of this outcome are not aligned with the predictability of NixOs (I’m happy to be corrected). I have outlined my thoughts below in the hopes that they open a dialogue around these issues, or my perceptions are corrected.

There are several issues with the current situation which I found confusing/unintuitive.

  1. The implicit conversion of a Derivation from being isolated from the network to allowing network access. It may be helpful to have distinct functions, one allowing network access (if a sha256 is provided), and one which isn’t.

  2. Build inputs not being considered when a sha256 is supplied. This feels incorrect as quite regularly a derivation will perform post-processing after a download in the form of a build-script.

  3. When running a nix-build -A bar --check, it passes when in fact the hash of the directories are different. This feels like an unintuitive result.

Thanks again for your time.