Best way of sharing /nix in CI

groodt · January 22, 2021, 6:43am

We use Buildkite for CI and nixpkgs for declaring packages we use in our builds. Our agent pool is elastic. We run on AWS and the agents are EC2 machines.

Is there a good approach for “warming” the /nix on the agents as they come online?

I imagine a solution where there is one machine that can prebuild and cache the /nix store, but how can this then be copied onto other agents. A nix S3 cache does work wonders, but is there an even better way to do things so it is done ahead of time?

domenkozar · January 23, 2021, 11:04am

Nix scales better vertically so I’d look into solutions with beefier specs and how to scale that way.

groodt · January 25, 2021, 1:16am

Thanks. I think that’s probably a good approach for pure Nix builds. We use a hybrid model though, Nix isn’t used for the actual build itself. Nix is only used to provide us with nixpkgs that the non-Nix build uses. Hopefully that makes sense?

The concurrency model with Buildkite is to use a separate “agent” for each “build step”, be that linting, running unit tests, building artifacts etc. So even if we had massively powerful machines, running N buildkite agents on a single machine, we still would need a good mechanism to scale horizontally. Vertical scaling would allow us to run ~16 buildkite agents on an EC2 instance to share reasonable system resources, but we need to scale beyond this. Our builds have 20+ concurrent steps and 10s of build pipelines run concurrently. So we run thousands of agents concurrently at the moment, across many hosts, so vertical scaling would not be enough on its own. Even if we could pack multiple build steps onto a single host, we would still need to run hundreds of hosts.

domenkozar · January 25, 2021, 12:00pm

What’s the overhead of fetching Nix dependencies in that case?

You could build AMI with /nix/store pre-populated, that would require rebuilding it once you bump nixpkgs.

groodt · January 25, 2021, 10:29pm

It isn’t too bad, but when running hundreds or thousands of nodes, the perception to users of the CI system is that “nix is slow”.

I think this simple solution is a good one. Even if we rebuild the /nix/store only once per day, it will still be beneficial. It would minimize the downloads of some of the very slow changing dependencies in Nix, plus it would still be flexible enough to add a new dependency in a build that gets fetched or built “just in time” via Nix.