Understanding rebuilds: jarring "screenshot failed" error on `nixos-rebuild switch --upgrade`

Okay partially answering some of my own questions…

101 “how do things work” questions…

tl;dr lots of phases and every discrete package has at least one chance to run a test suite

… from nixpkgs manual on the “phases” of std.mkDerivation:

This generic command either invokes a script at buildCommandPath, or a buildCommand, or a number of phases. Package builds are split into phases to make it easier to override specific parts of the build (e.g., unpacking the sources or installing the binaries).

and then just a handful of paragraphs down, the order in which phases run by default:

$prePhases unpackPhase patchPhase
$preConfigurePhases configurePhase $preBuildPhases buildPhase checkPhase
$preInstallPhases installPhase fixupPhase installCheckPhase
$preDistPhases distPhase $postPhases

So that’s helpful: there’s not only checkPhase but also installCheckPhase.

… And then a few more sections down, an explanation of the checkPhase:

The check phase checks whether the package was built correctly by running its test suite. The default checkPhase calls make $checkTarget, but only if the doCheck variable is enabled.

It is highly recommended, for packages’ sources that are not distributed with any tests, to at least use versionCheckHook to test that the resulting executable is basically functional.


rebuild-correctness question…

I’m guessing this just means the test is flaky/non-deterministic.

Or: there’s a bug in phase-implementation that allowed the build to stay around. That is: my second attempt had almost zero output, which I’m guessing means everything was already installed and built, and so no phases triggered… but then that makes me think the phases that failed the first time around didn’t actually trigger any rollback of the states they were changing.

Would love some input here, since this feels a little too naive of a bug for others to not have hit yet.


logs-clarity questions…

This hope of better understanding logs lead me to this 7 year old “interactive rebuild” post/feature-request, since there’s just too many interleaved logs to try to make sense of things sometimes. So I guess interactivity isn’t a solution.

But even more interesting is this reply about rollbacks:

Which makes me think my “too naive” thought above might actually be right: failed rebuilds really don’t rollback by default? :thinking: