Recommendations for introducing a shared nix store or cache for CI/CD and development

ldeck · September 28, 2021, 6:11am

I’d love to use Cachix at my company as it looks fantastic, but in reality I’m introducing Nix to my company and I’m pretty sure it’d be an uphill sell atm, sorry @domenkozar.

So I’m looking to move from using nix purely for the team on development machines, to also using it on CI/CD (private gitlab). We don’t currently use any shared store or cache, so I’m just wanting to push the dial forward at this point. (I’ve only experience so far with /nix/store as a single user install).

Our infra team already manage mounted drives for our gitlab runners (e.g., for maven). So one option would be to possibly mount something at /nix.

Is having this read/writeable recommended? How would people manage garbage collection of packages over time?

Another option i see could be using aws s3 as a shared cache (as per manual Serving a Nix store via AWS S3 or S3-compatible Service). The company uses a number of s3 buckets already so this could be suggested.

How would this fit into development and CI/CD workflows?

I’d love to hear any recommendations? Thanks!

nixinator · September 28, 2021, 10:38am

https://nixos.org/manual/nix/unstable/package-management/ssh-substituter.html

you can roll out a dedicated nixos build server for your projects, and get the build done through gitlab runners if you like, or why not try out the hercules-ci runners , they make it really easy and are the nix native and nix aware while send some love and money to them!

you can the server the artifacts via ssh-serve , all the developers need to do is give you public ssh key and make sure they are using that machine as substitutor for them.

You can go heavyweight and use hydra… but! it’s a big bit of infrastructure to run.

depends how much time you have.

ldeck · September 28, 2021, 11:40am

Thanks @nixinator for the suggestions. A few questions / clarifications on what you’ve suggested:

Re [Serving a Nix store via SSH[(Serving a Nix store via SSH - Nix Reference Manual), aka ssh-substitutor, the problem here is that this doesn’t provide a solution for populating said shared remote store.

But perhaps coupling that with Copying Closures via SSH could be used as an after_script.

I would assume the remote machine would need to be running a nix-serve?

Alternatively, what about a mounted drive on the gitlab runners at /nix? Would that work (so long as it supported a shared lock)? What safety measures would be required? This is likely the first option we’ll try… so would love to know more about this scenario.

As far as hercules, etc, whilst they look great, I need to prove nix’s worth before getting more sophisticated with sass options.

So the main two options that look attractive atm seem to be a mounted drive and/or S3 store. But I’d like to know practically how these might best be integrated and, of course, what other options may be available.

Thanks again!

toraritte · September 28, 2021, 3:18pm

Your description sounds exactly what flox is all about (and this is the announcement on discourse).

To give a rough summary: It provides a shared Nix store, where you can build your own channels/profiles, and changes to repos or to your Nix expressions automatically trigger re-builds on a remote Hydra build farm - essentially a distributed Nixpkgs setup. This also means that you only have to set up everything once, and use it seamlessly on any machine without extra configuration.

nixinator · September 28, 2021, 8:12pm

ah yes, i’d forgotton about that, i did a bit of testing with it, and it was pretty good. It might fit your needs perfectly…

so little time, so many tools!

ldeck · September 28, 2021, 8:22pm

Thanks @toraritte for the suggestion. I do recall seeing the announcement (but had completely forgotten about it, since it was in beta).

The situation for me though is that for myself and developers on my team we on darwin (aka macOS), whereas our gitlab CI/CD naturally uses linux docker images to build our software (which includes: springboot services, react js apps, db migration tools, cloud iac etc). A number of developers in our company are also on windows (none of whom are using nix).

So I suppose there’d be limitations on what could be shared between developers and CI/CD pipelines.
But flox definately looks good.

nixinator · September 28, 2021, 8:58pm

You can make these non-native runners work, but it’s not ideal. Even keeping then up and running can be a chore, as nix doesn’t allow auto code updates, runners expect a system where they can bump and change the code without warning. Run an old runner, and it breaks the API and github at least rejects it, and it’s difficult to automate it as you have to rekey the runner and bump the version manually… are real PITA. Having a broken pipe line is no fun.

so, try and setup your own runners, and don’t mess with the current pipe line.

Nix gives you a lot of cool stuff, but most developers don’t even think they need what nix has to offer.

and i just figured out the self hosted runners from github, are written in c# .net , so i guess the microsoft borg assimilation of the github is starting. I’m not sure how i feel about that. Maybe i shouldn’t feel anything…

ldeck · September 30, 2021, 12:27am

So I’ve tried bind mounting a shared volume’s directory in the gitlab job…

  before_script:
    - set -x
    - mkdir -p /usr/share/gitlab/.nix/store
    - mount -o bind /usr/share/gitlab/.nix/store /nix/store

And am getting this error, of course,

mount: permission denied (are you root?)

Any suggestions?

kamadorueda · September 30, 2021, 2:16am

My recommendation is to keep it simple

Mounting /nix/store on the runners is complex, I have personally tried that, and it’s not as shared nor saas as you would want it to be.

On the other hand, Cachix works out-of-the box at an excellent cost. In my case, selling it to my company was easy: hey! we are in the business of cibersecurity, not maintaining binary caches. The less time we spend doing binary caches, the more time we spend adding value to our customers. Cachix is the way to go: just cachix use, cachix push, and enjoy.

However, if for business constraints you still require something else the next simpler alternative is using S3. Place the nix copy --to 's3://example-nix-cache?profile=cache-upload&region=eu-west-2' ./result in gitlab’s after_script for writting, and just setup substituters for reading in the before_script. You can add a lifecycle policy to the bucket for automatic garbage collection after objects reach certain age

We currently run our company on Nix and gitlab at a pace of 50-100 daily deployments to production, ~200 jobs per deployment (dev+prod)

We use 100% and only this:

Which is magically easy to configure on gitlab
And has support for using binary caches
like this:
makes.nix · 4e096fc0a81e6915d396d592e80a441dc48c6d14 · Fluid Attacks / universe · GitLab

If you want to give it a try, adding support for writting to S3 caches to the framework should be very simple. Reading S3 caches is already possible

kamadorueda · September 30, 2021, 2:22am

@nixinator yeah, it was a pain before, then we started using this:
GitHub - cattle-ops/terraform-aws-gitlab-runner: Terraform module for AWS GitLab runners on ec2 (spot) instances and they now even automatically register themselves into the gitlab’s coordinator

This is the code for our ci/cd infra: makes/foss/modules/makes/ci · 489c1b1462848668332f325914f211e3563f21a5 · Fluid Attacks / universe · GitLab

can’t be cheaper and more scalable at the same time than that

nrdxp · September 30, 2021, 2:53am

I’m going to have to second @kamadorueda. Flox is certainly a valid suggestion, but if you want or need to keep it FOSS then makes is probably the best bet right now since it is designed as a CI framework. It has a single invocation m . __all__ to simply build all defined tasks beforehand and optionally upload them to a cachix (we plan on making this more general and add an option to simply call nix copy soon), so that you only have to build your CI tasks once beforehand, and every runner can just pull from the cache.

ldeck · September 30, 2021, 7:51am

Oh, actually that’s a fail anyway as the /nix/store is already in the nixos/nix docker image.

I wonder if it’s possible to layer them, treating the /nix/store from the docker image as r/o

ldeck · September 30, 2021, 10:52am

Thanks for the suggestion @kamadorueda. That sounds great.

Motivation / pain points

We were just hoping to save having to download/upload derivations regularly per job, instead of having it mounted and immediately accessible.

So I realise now there’s a few problems with this:

the docker image we’ve used so far (nixos/nix) itself already has a /nix/store. We could potentially look at using layerfs to layer this with a bind mounted /nix/store from the host/network. Perhaps.
The docker container (nixos/nix) would maintain its own store db. Persisting this doesn’t sound trivial (with simultaneous read/writes across jobs)

Bootstrap Container Store?

Perhaps the above could be worked around by creating a docker image that has static nix tools in an alternate location, allowing /nix/store to merely contain fetched/built derivations. But again, the question of a persisted and shared/served nix store with db remains. See @zimbatm’s suggestion on this here.

S3 Store Pro/Cons?

So another thing we’ve considered looking at using is a read/write S3 store. I presume/hope this configuration would involve simultaneous read/write options, but again would involve pushing/pulling as required instead of something that’s immediately available at a mount point.

Longer Term

So longer term I’d want to move to more sophisticated tools like makes as you suggest @kamadorueda.

Short Term

I’m looking for a quick win. The primary caveat atm is downloading artifacts for each job from cache.nixos.org. That’s pretty quick to be fair, but obviously if we can save the time that’ll save us time.

Something like storing/restoring app paths just for our project / branch?
Of course, downloading from cache.nixos.org isn’t the biggest painpoint atm. Just an obvious point of optimisation.

nixinator · September 30, 2021, 5:15pm

i like CI/CD pipe lines with as little moving parts as possible, there’s a of moving parts there, and with every build you do, your making amazon deliveries to the international space a step closer. This might be your plan or it may not.

That’s a nice diagram! very colourful

austin · September 30, 2021, 8:23pm

I think it’s https://www.cloudcraft.co/

kamadorueda · October 1, 2021, 3:12am

Very interesting resource! thanks
I just copied the image from their repo’s README
But now I know how to do them!

veprbl · October 5, 2021, 5:36am

If I’m not mistaken, the only thing preventing one from having a store shared over NFS is that Nix uses SQLite to store the metadata. Would be interesting to see if that can be replaced with a database that actually supports networking.

uri-canva · October 5, 2021, 10:38pm

Have you looked at using post-build-hook?
See Untrusted CI: Using Nix to get automatic trusted caching of untrusted builds for how to set it up.
It requires very little additional code / moving pieces / infrastructure (just an s3 bucket), and handles downloading / uploading only what is needed.
We found it to be very simple to set up and use, and it behaves exactly as we expect it to.

efx · October 6, 2021, 1:58pm

I’ll chime in on my experience integrating nix to do the CI for a small, private Rust project self-hosted GitLab.

I did this a year ago so details are fuzzy.

I wanted to experiment with a shell runner that invoked a single user nix installation. This acted as an alternative to the daemonized world of Docker.
The performance seemed better. Our project pulls in clang and other system dependencies so nix’s store wonderfully avoiding duplicating those downloads.

Is having this read/writeable recommended? How would people manage garbage collection of packages over time?

This is where I struggled. We self-host the GitLab runner on an EC2 instance and we kept filling the disk space from the size of /nix/store. I would manually run nix-collect-garbage -d to prune packages. Looking back, I probably should’ve created a scheduled pipeline to do that!

In the end I moved away from the Nix solution because I am the only one on my team familiar with Nix and I ran out of energy. I moved to a GitLab Docker runner and various Docker images for our jobs for ease of maintenance.

I’m wondering about my decisions as I investigate optimizing our Rust project’s CI pipeline.

makes looks promising! I also may try my shell runner with nix directly again. This time I’ll implement garbage collection or see if I can use S3 fs or some LInux trickery to mount larger disks for nix to use.

griff · October 6, 2021, 9:22pm

We also run a private Gitlab and have tried several different techniques for managing nix caching.

One of the key selling points of cachix is the support for LRU based garbage collection which I am not aware of any other nix caching solution having.

When using S3 as a cache there is no GC available that I am aware of so your cache will just keep growing.

When using SSH the cache machine just needs nix installed and you can then set GC roots on that machine to keep paths and do GC on that machine at regular intervals.

As for populating the cache: You do it much like when using cachix but with nix copy or nix-copy-closure. One difference is that you have to manually sign the paths yourself before uploading which is something that cachix does automatically for you.

We also have a gitlab docker runner that mounts the host /nix/store and nix daemon and uses that for building and as en extra cache. Not quite your shared nix store idea but it has nice job isolation with docker and good host caching.

One way you could have a shared nix store is by implementing a process kind of like what the NixOS ISO does. It contains a nix store and a db dump (nix-store --dump-db) and mounts that read only store together with a writeable layer using overlayfs and then runs nix-store --load-db to initialize the SQLite DB.