How to prevent fixed-output paths from being copied to the nix store on each evaluation?

To override a potentially broken firmware version on my laptop, I checked out the linux-firmware repository, replaced the firmware by a newer one, and overrode nixpkgs’ firmwareLinuxNonfree attribute in my configuration.nix:

  nixpkgs.overlays = [
    (self: super: {
      firmwareLinuxNonfree = super.firmwareLinuxNonfree.overrideAttrs (_: {
        name = "firmware-linux-unfree-patched";
        src = /home/kaldonir/linux-firmware;
        outputHash = "19858af9sy784a8sz8r70g2nzfyx9pavshs0g8m0ar940fcv9dp9";
      });
    })
  ];

That repository is more than 800 megabytes in size, and now on every nixos-rebuild, evaluating the config takes quite a while, presumably because the src is copied to the nix store every time:

warning: dumping very large path (> 256 MiB); this may run out of memory

I thought the outputHash attribute should prevent this, because the hash of my firmware-linux-unfree-patched package only depends on that hash. Why does it still want to evaluate the source directory, and how can I prevent this?

1 Like

Hi,

First, there seems to be a misunderstanding. The output hash is for firmwareLinuxNonfree, not for firmwareLinuxNonfree.src. So in practice you do not provide the hash of the sources, but of the final derivation.

To provide a hash for your sources, you must use buitlins.path.

 nixpkgs.overlays = [
    (self: super: {
      firmwareLinuxNonfree = super.firmwareLinuxNonfree.overrideAttrs (_: {
        name = "firmware-linux-unfree-patched";
        src = builtins.path {
          path = /home/kaldonir/linux-firmware;
          sha256 = "19858af9sy784a8sz8r70g2nzfyx9pavshs0g8m0ar940fcv9dp9";
        };
      });
    })
  ];

Finding the right hash requires nix-hash --type sha256 /home/kaldonir/linux-firmware.

Please not that as long as the store contains something matching that hash, it will not look at your sources, and will not catch changes there. You will get no warning for that.

3 Likes

Your suggested solution does in fact solve the problem, but I still don’t understand why the problem even occured:

The hash 19858af9sy784a8sz8r70g2nzfyx9pavshs0g8m0ar940fcv9dp9 is in fact the hash of the overriden firmwareLinuxNonfree, not the one of the sources. If I do not provide this hash, my firmware-linux-unfree-patched package won’t even be rebuilt, and if I provide the wrong one there is a hash mismatch error in the build process.

That’s why I assumed that this hash is the one determining factor for this package - but during evaluation the src attribute seems to be inspected as well. Why is that?

Great, I am happy it worked :slight_smile:

Now, what you did is provide a sha256 attribute to a derivation. This is not enough, (and not necessary) to turn it into a fixed output derivation (see the manual).

To do so, you have to provide all three of outputHash , outputHashAlgo and outputHashMode. As it is a bit tedious, fetchurl provides a simpler interface, where you only have to provide either sha1, sha256, or some other hash. This is done in …/fetchurl/default.nix

So, for firmwareLinuxNonfree, you are not overriding a fetchurl call, and there is no other way than to use the three attributes.

outputHash =  "19858af9sy784a8sz8r70g2nzfyx9pavshs0g8m0ar940fcv9dp9";
outputHashAlgo = "sha256";
outputHashMode = "recursive";

The complexity here is a good hint that you should not do that. firmwareLinuxNonfree is supposed to be reproducible because all its inputs are. You should not have to make it fixed-output.

firmwareLinuxNonfree is already fixed-output (I assumed to prevent having to redownload a 800MB repository just for the evaluation step). outputHashAlgo and outputHashMode are provided by the original package that I overrode:

Long story short, you can only ask for .drv’s to be built. So you need to compute the .drv before even asking for it to be built, and found to already exist. Computing that .drv forces the src attribute to be evaluated, and copied to the store.

The optimization you think of (not building the .drv when the output already exists) would break the execution layers in nix code. (eval .nix → generate .drv → build packages). Feasible, but not implemented, and several issues need to be addressed first. For example, have a look at Allow remote builds without sending the derivation closure · NixOS/nix@1511aa9 · GitHub which, oddly enough, I was reading at the very moment I saw your answer.

PS: Please ignore my previous message. I though you had sha256 = ... in your initial post, but you had the correct outputHash = ... version.

1 Like

Alright, that makes a lot of sense, thanks for the thorough answer! I didn’t realize that there is no way .drvs are as lazily evaluated as I thought, and assumed fixed-output derivations are an optimization to avoid, e.g., in this case, downloading an 800MB src repository just for figuring out that the package is already in the nix store.

But then I’m wondering what exactly fixed-output paths are an optimization for. Is it only for optimizing evaluator runtime which now only has to take the outputHash attribute into account?

That’s probably because they are not intended as an optimization for anything. Fixed output derivations are not sandboxed which gives them access to the network among many other resources (global filesystem, usernames, etc.)
It is does not disrupt the derivation purity, as long as the unrestricted process produces something with the right hash in the end.
You could see fixed-output derivations as pure derivation despite network access.

That’s why it is a bit weird to make firmwareLinuxNonfree a fixed-output derivation. Technically, it could be a “normal” derivation, which would ensure that it is recompiled iff an input changes. Now, because it is a fixed output derivation built on top of another fixed output derivation, changing firmwareLinuxNonfree.src requires updating two hashes. This is indeed the case in the history of that file.

The explanation for this deviation to common practice can be found in the log (here) to have been decided in #44605.

Because any change to the inputs triggers a rebuild, any change to non-essential inputs (like gnu make for example) triggers a rebuild of the package, with a different name. Such changes are quite frequent, and the sheer size of this package, considering that it only copies some files from the sources, made this customization worth the pain of updating two hashes, and the quite unlikely discrepancy between the actual output and the fixed output.

So there is indeed an “optimization” gained by fixed-output derivations. You can change the way they are produced without any impact on dependent packages.

Another option would be to make the fetchgit call output the right things directly. See for example my hack below. The hack uses only one fixed-output derivation. Simpler to update, but nearly impossible for you to override in the fashion of the initial post.

The current nixpkgs approach is a compromise. You need to override two hashes, but it remains quite easy to override.

3 Likes

That clears up everything. Thank you!