Question about using multiple nixpkgs pinned versions in automation and consequent bloat

Consider a use case where we are:

  • installing, with nix, multiple tools for our development environment
  • pinning nixpkgs as described in Towards reproducibility: pinning Nixpkgs — nix.dev documentation .
  • using this same nix setup in our automation (e.g. CI or CD, both use many of the same tools used for development. E.g. go or terraform). Doing this is good for increasing the parity between development, testing and production environments.
  • versioning each tool separately. In the spirit of breaking down large tasks and submitting small pull requests, we don’t want to update the versions of all of our development tools at once.

E.g. :

let
	  go = (import (fetchTarball "https://github.com/NixOS/nixpkgs/archive/<COMMIT-HASH-1>.tar.gz")).go;
	  terraform = (import (fetchTarball "https://github.com/NixOS/nixpkgs/archive/<COMMIT-HASH-2>.tar.gz")).terraform;
	 # many (e.g. 20 or more) other tools
in #...

Problem: downloading each of these many tarballs could add a lot of bloat to automation environments. E.g.: downloading GitHub - NixOS/nixpkgs at 1c4d0f130b0536b68b33d3132314c9985375233c gave me 47.7 MB zipped, 181.7 MB unzipped. Multiply this by the number of tools you’re using through nix and things get unwieldy pretty quickly: large build times and large storage in caches (e.g. docker container registries). The overwhelming majority of files inside these tarballs are not being used. What are possible solutions that ameliorate this problem while making little or no compromises?

1 Like

It looks to me like you have a specific set of packages for which you would like to use specific, pinned versions, and it doesn’t really matter that much to you if the toolchain to build them occasionally changes a bit. Have you considered using a globally pinned, single version of nixpkgs, and then maintaining packages for those tools that you need this much granularity for yourself downstream?

You can even then override those specific packages inside nixpkgs, so that if you, for example, pin a different clang/gcc version, it is used to ultimately build everything else, which enables taking more control of your full toolchain.

Couple that to a nix cache and you have a pretty painless way to control specific package versions, irrespective of what upstream does, that doesn’t eat all your engineering time, and still maintains enough ability to inspect your supply chain thanks to nixpkgs pinning. This would, in effect, be a fork of nixpkgs that is rebased whenever you update your pin (which is also a valid way of doing this).

If packaging some of these things is hard, consider vendoring individual upstream .nix files. This is similar to what you do today.

Ultimately if you really need perfect, fine-grained granularity, you end up needing to build your own distro from scratch (which is possible with nix, too, mind you). That is the no-compromise, expensive option, but it’s certainly an option (Google often brags they take this route, for example, to the point of maintaining their own Linux kernel).

Pinning the way you do today is an interesting choice, too. You may not realize this, but it does also have a significant trade-off: changes to root packages upstream will propagate to your tools at different rates. If a vulnerability in, say, zlib is found and fixed upstream, you will stay vulnerable until every tool has decided to bump the pin. If everybody bumps pins constantly as a result anyway, you might as well have a central nixpkgs pin and actually control the situation.

1 Like

it doesn’t really matter that much to you if the toolchain to build them occasionally changes a bit

That’s correct.

Have you considered using a globally pinned, single version of nixpkgs, and then maintaining packages for those tools that you need this much granularity for yourself downstream?

Yes, but I am new to and very incompetent at nix, so doing it from scratch (no vendoring) is not viable for me right now.

If packaging some of these things is hard, consider vendoring individual upstream .nix files.

I see two downsides which are probably specific to my lack of nix knowledge:

  1. It might be the case that these vendored files are coupled to specific versions of the toolchain they depend on. If such coupling is found, we could vendor the coupled tool as well, but then who knows when that recursion ends. Also, finding the coupling might be very tricky.
  2. It requires more learning than I initially would’ve liked, as I believe I will have to properly understand the vendored files to make sure they are functioning in the same way as if they were not being vendored.

Even with these downsides, I think it’s a good idea that I actually could implement, and it does solve the problem. Unfortunately, I think I’d have to give it a low priority in the backlog mainly because of the long time to learn it.

you will stay vulnerable until every tool has decided to bump the pin

Great point, I hadn’t realized this.

Thanks for the very detailed answer! Especially because it includes a solution I will likely be implementing at some point.

1 Like

Nix packages are luckily very modular by nature, as they’re all functions. Thanks to the use of callPackage the toolchain they depend on ultimately is provided by the version of nixpkgs providing the callPackage, i.e., the thing that your whole system is tied to anyway, so vendoring things out is surprisingly easy :slight_smile:

If you’re planning to do this kind of advanced work with nix I’d recommend reading the pills, and especially this chapter because it explains what I mean here: NixOS - Nix Pills

It may indeed be a bit of a learning curve, though, especially for some of the more complex packages, as they will need maintenance anyway.

1 Like

I’ve thought more about it and decided that in my case vendoring the files is not worth the engineering time, even though this constraint was not mentioned in the OP. I’ve removed the “solution” status from the reply that suggested this just to increase the chances that more alternative solutions are suggested.