Avoid hardcoding toolchains in build tools / compilers

uri-canva · August 9, 2022, 12:00am

Avoid hardcoding toolchains in build tools / compilers

opened 11:54PM - 08 Aug 22 UTC

0.kind: enhancement

Users of nixpkgs can use packages in nixpkgs in a couple of different ways: 1. …within a derivation set, like nixpkgs itself, as an input of another derivation. 2. in a nix environment, like a nix shell or nixOS system, but outside of a derivation set. 3. in a non-nix environment, like a host OS using nixpkgs as a package manager. For some programs supporting all 3 is simple enough, all the inputs to the program are passed in at runtime via command line flags, configuration and stdin, so they can operate in the exact same way. For build tools and compilers however, the amount and complexity of implicit inputs read from the environment / built in configuration is such that they cannot run in the same way, and different versions are created (for example `clang` and `clang-unwrapped`, `bazel_*` with and without `enableNixHacks`). This doesn't always happen because build tools and compilers in nixpkgs are mostly used within a derivation set (use case 1), so sometimes they're built in ways that can only support that use case, or other use cases are less exercised and break more easily. As a consequence of that, some build tools and compilers hardcode some of their inputs on the assumption that they're going to be used within the package set that they were built in. For example bazel hardcodes the python interpreter, shell and more, cmake hardcodes the libc it builds with. I think this is something we should avoid: not only does it make suppporting use cases 2 and 3 much harder, but it also means that if you want to change those inputs in use case 1, you have to rebuild the build tool / compiler, which can take a very long time since they're usually very big programs, and have complex builds often involving bootstrapping.

Users of nixpkgs can use packages in nixpkgs in a couple of different ways:

within a derivation set, like nixpkgs itself, as an input of another derivation.
in a nix environment, like a nix shell or nixOS system, but outside of a derivation set.
in a non-nix environment, like a host OS using nixpkgs as a package manager.

For some programs supporting all 3 is simple enough, all the inputs to the program are passed in at runtime via command line flags, configuration and stdin, so they can operate in the exact same way.

For build tools and compilers however, the amount and complexity of implicit inputs read from the environment / built in configuration is such that they cannot run in the same way, and different versions are created (for example clang and clang-unwrapped, bazel_* with and without enableNixHacks). This doesn’t always happen because build tools and compilers in nixpkgs are mostly used within a derivation set (use case 1), so sometimes they’re built in ways that can only support that use case, or other use cases are less exercised and break more easily.

As a consequence of that, some build tools and compilers hardcode some of their inputs on the assumption that they’re going to be used within the package set that they were built in. For example bazel hardcodes the python interpreter, shell and more, cmake hardcodes the libc it builds with.

I think this is something we should avoid: not only does it make suppporting use cases 2 and 3 much harder, but it also means that if you want to change those inputs in use case 1, you have to rebuild the build tool / compiler, which can take a very long time since they’re usually very big programs, and have complex builds often involving bootstrapping.

uri-canva · August 9, 2022, 12:01am

cc @eee @chenlijun99