linkFarm/buildEnv vs runCommand (copying everything)

Am I correct saying that by using linkFarm or buildEnv you end up by increasing the number of derivations in your runtime closure? And that it can be a bad pattern in some cases?

I’ve a PR where I’m adding hundreds of auto-generated derivations and instead of using buildEnv/linkFarm I chose to use runCommand to put everything together by copying the paths.
This way I’m increasing the build closure both in terms of number of derivations and total size BUT when nix tries to substitute that path it only has to fetch one.
Indeed using runCommand basically I’m not putting all the other derivations in the runtime closure so nix doesn’t have to substitute all the paths as it would need with buildEnv/linkFarm.
Also increasing the build closure is not a big issue because the transient derivations will be garbage collected, and when using a cache (like cache.nixos.org in my case) you will not even fetch them.
I believe that keeping the number of derivations in the runtime closure is a good thing especially when you are using multiple substituters, otherwise nix will potentially query them all for each derivation in the closure.

(Let’s ignore the fact that by default both buildEnv and linkFarm have preferLocalBuild = true and allowSubstitutes = false so nix doesn’t even try to substitute their outputs)

Also, is this a fine optimization to consider in nixpkgs?

In my latest comment I basically say the same I said here but applied to the specific case.

Please debunk me if I’m wrong :slight_smile:

I’d say it really depends on the particular case. Also note that if you have auto-optimise-store = true, the copies turn into hardlinks, so basically free on disk (not when copying or cache.nixos.org though).

1 Like

More derivations means (ideally) less to build when there’s a cache miss, so it has upsides. (Which is why I prefer more derivations when using e.g. rust since the builds are so slow).

I think for the binary cache total size it is worse, so probably no.

2 Likes

cp automatically hardlinks already on modern Linux. So copying from one derivation to another will already automatically share inodes.

optimise store still helps for files that are accidentally the same. But if they have the same heritage you already get inodr sharing.

You mean reflinks? That’s an interesting variant, but I don’t expect it will differ significantly from optimise-store for our use cases.

EDIT: and I thought reflink has a separate inode but shares the data extents (not sure about terminology here).

Sorry yes. Reflinks. Not hardlinks