We use Buildkite for CI and nixpkgs for declaring packages we use in our builds. Our agent pool is elastic. We run on AWS and the agents are EC2 machines.
Is there a good approach for “warming” the /nix on the agents as they come online?
I imagine a solution where there is one machine that can prebuild and cache the /nix store, but how can this then be copied onto other agents. A nix S3 cache does work wonders, but is there an even better way to do things so it is done ahead of time?
Thanks. I think that’s probably a good approach for pure Nix builds. We use a hybrid model though, Nix isn’t used for the actual build itself. Nix is only used to provide us with nixpkgs that the non-Nix build uses. Hopefully that makes sense?
The concurrency model with Buildkite is to use a separate “agent” for each “build step”, be that linting, running unit tests, building artifacts etc. So even if we had massively powerful machines, running N buildkite agents on a single machine, we still would need a good mechanism to scale horizontally. Vertical scaling would allow us to run ~16 buildkite agents on an EC2 instance to share reasonable system resources, but we need to scale beyond this. Our builds have 20+ concurrent steps and 10s of build pipelines run concurrently. So we run thousands of agents concurrently at the moment, across many hosts, so vertical scaling would not be enough on its own. Even if we could pack multiple build steps onto a single host, we would still need to run hundreds of hosts.
It isn’t too bad, but when running hundreds or thousands of nodes, the perception to users of the CI system is that “nix is slow”.
I think this simple solution is a good one. Even if we rebuild the /nix/store only once per day, it will still be beneficial. It would minimize the downloads of some of the very slow changing dependencies in Nix, plus it would still be flexible enough to add a new dependency in a build that gets fetched or built “just in time” via Nix.