How does Nix know which substitute to use when installing packages?

toraritte · July 22, 2021, 11:39pm

^{Cross-posted to Stackoverflow}
Either when installing from a channel or when pinning Nixpkgs.

Let’s say there is a shell.nix like this:

{ pkgs ? import <nixpkgs> {} }:

  pkgs.mkShell
    { buildInputs = [ pkgs.deno ]; }

and then simply invoking it:

$ nix-shell
[nix-shell:~]$ deno --version
deno 1.3.3
v8 8.6.334
typescript 4.0.2

Then use a pinned version of Nixpkgs:

$ nix-shell \
  --arg pkgs 'import (fetchTarball "https://github.com/NixOS/nixpkgs/archive/f4593ab.tar.gz") {}' \
  shell.nix

[nix-shell:~]$ deno --version
deno 1.8.3 (release, x86_64-unknown-linux-gnu)
v8 9.0.257.3
typescript 4.2.2

Is it something along these lines?

The NixOS org’s (?) Hydra build farm periodically builds binaries and publishes the results in channels (based on branches in the Nixpkgs repo)
Binaries are uploaded to the binary cache (from the Nix manual: “binaries have been built and uploaded to the binary cache at cache.nixos.org”)
When a process is started to install a package (nix-shell, nix-env, etc.) the Nix expression is looked up in Nixpkgs
A derivation is built and placed in the Nix store (?; point is that there will be a hash value that is compared against the binary cache)
If there is a substitute, it will be pulled, otherwise source deployment continues.

The different deno versions are simply the result of the differing Nix expression contained in those snapshots yielding different hashes, and so a different substitute has been downloaded in each case . (In the first case, it’s the latest from the nixos-20.09 release nixos-20.09.4407.1c1f5649bb9 channel that is set up on my laptop, and the latter is picked from … whatever channel - unstable? - that has a binary with that hash?)

jonringer · July 23, 2021, 12:25am

May not be 100% correct, but this is my mental model:

Nix will figure out the desired outpath(s) based on nixpkgs
Nix will recursively ask which outpaths are available in binary caches (if substitution is enabled), this will usually be seen as querying cache.nixos.org ...
- Any cache misses will need to be built
- Eventually a cache hit should occur (bootstrapping tools)
Any cache misses are built, eventually arriving at the final output.

Another way to view nix builds are the .drvs are the unambiguous way to build something, and substitution allows for a way to bypass having to build potentially everything.

toraritte · July 23, 2021, 11:40am

Thanks! But if we would go deeper into what is involved in the “Nix will figure out” part, is my description correct?

Keep forgetting that a Nix expression can produce multiple outputs - appreciate the reminder!

layus · July 23, 2021, 12:26pm

Your description is correct, albeit somewhat imprecise. Nix expression is looked up in Nixpkgs and A derivation is built and placed in the Nix store are vague.

Nix builds a build plan in the form of a DAG of .drv files. This is always performed. It is the evaluation phase.
Nix tries to realize the topmost package of the build plan (i.e. get it on disk, either by building or substitution from the cache). It starts by querying the cache for quick substitution. When everything is in the cache (the package, and all its dependencies), that’s it.
When the target derivation is not in the cache, then the build plan is followed recursively. All the inputs of the derivation to build are realized, then the derivation is built locally.
Of course, the process is recursive. realizing inputs also tries the cache first, then falls back to a local build, and so on.

At its own pace, hydra builds a lot of derivations at several points in the nixpkgs history. If you stick with channels, you get a better hit rate. But keep in mind that not all packages are built by hydra, and that unstable channels get released as soon as a small subset of the packages build correctly. So you may get cache miss because hydra did not yet build your package (you updated too eagerly), because that package is not built by hydra, or because that specific version is not within the subset of commits build by hydra. And of course when you develop local package, that hydra has no change to know about.

The build plan is not fully realized, except when the cache is empty. Some inputs are build-time inputs, and not runtime dependencies. As soon as a package is in the cache, nix uses the cached information about the dependencies, not the build plan.
And there are even more details and funny subtleties. But yes, your depiction is about right :-).

layus · July 23, 2021, 12:31pm

And then your initial question resolves itself trivially. Substitutes are indexed by the build plan. If the build plan differs, it queries a different cache entry. You only get the substitute for what you want to build. Change as little as adding a space in the build description (adding a space to buildPhase for example) and you will query a different cache entry, and probably result in a cache miss.

toraritte · July 23, 2021, 12:49pm

Thanks a lot! The sloppy formulation was intentional to invite in-depth explanations (and because I barely knew what I was talking about…: ) I completely forgot about the DAG, and the fact that it is very rare when only one store path is involved - there are almost always dependencies that also need to be pulled in.

I wonder if there is an animation of this entire process somewhere - would be fun to do.