Why do Nix try to realise a derivation that already failed?

ThibautMarty · February 25, 2020, 5:44pm

I’m wondering why Nix would ever try to realise again a derivation that already failed.
If the derivation’s output are already valid, Nix do not realise a derivation a second time.
Otherwise, it would try to realise it again and again, while doing it more than once is useless (if the determinism assumptions hold).
I’m speaking here about the derivation’s builder failing. Not the required system/system features being unavailable, or the realisation being killed, etc.

I (think I) understand how Nix works and why there is this behaviour (the outputs path are not valid, period).
My question is more about if there is a good reason for that, and, if not, if it is possible to fix it.

Example

This command will ever fail and Nix will try to realise the derivation each time:

% nix-build -E 'derivation { name = "x"; builder = "y"; system = __currentSystem; }'
these derivations will be built:
  /nix/store/7r40h4gj6rsaw6xnsq7vj128dpiawa94-x.drv
building '/nix/store/7r40h4gj6rsaw6xnsq7vj128dpiawa94-x.drv'...
while setting up the build environment: executing 'y': No such file or directory
builder for '/nix/store/7r40h4gj6rsaw6xnsq7vj128dpiawa94-x.drv' failed with exit code 1
error: build of '/nix/store/7r40h4gj6rsaw6xnsq7vj128dpiawa94-x.drv' failed

Use cases

I sometimes put long tests in a derivation. If any of them fail, the builder eventually fails and the realisation fails. If I run again the same tests (same inputs, same derivation, same hash), it’s useless to waste CPU (and human) time.
It could be useful to git bisect a broken package in nixpkgs: one could just run git bisect run nix-build -A somepackage. If the inputs did not change and the build worked, we are just fine. If the inputs did not change but the build did not work, we are wasting time again. Think of it as a Nix-powered git-bisect <paths> argument.

Functional point of view

I guess we can see the failing derivation realisation as a bottom. But even in that case, Nix should not need to try evaluating the pure function twice.

vcunat · February 25, 2020, 8:50pm

I think it would be a nice additional feature. Especially if binary caches also supported this (negative caching). EDIT: of course, there would need to be a flag to force a retry, or something.

theotherjimmy · February 25, 2020, 10:50pm

I have run into issues with OOM that depend on the other running services on a machine. That particular case is not deterministic, and storing failure would have caused me some trouble there.

samueldr · February 26, 2020, 3:29am

It was an opt-in feature, at one point. From Release 2.0.

Failed build caching has been removed. This feature was introduced to support the Hydra continuous build system, but Hydra no longer uses it.

I liked to use that feature pre-2.0. It would help when bisecting, or even when fumbling around learning, so you don’t end up trying the same result twice for longer builds.