Migrating to Nix in production

Elyhaka · November 20, 2019, 9:53pm

At work, we’re currently designing our next production and deployment environement. As such we are in the process of evaluating Nix/NixOS as a solution.

Currently, we’re working with GitLab CI, Docker Images and Kubernetes. The developers are working on their workstations with simply a Docker daemon and whatever good IDE.

We want to move away from this workflow for the following reasons:

As we add more and more language to the stack we have to handle every different package managers; sometimes we must use Ubuntu images as base for our Docker images, sometimes we can use Alpine ones, and in the end we have a far too diverse production environement which had led us in some strange situations in the past…
We cannot use the exact same environement between development and production.
Kubernetes is adding a lot of complexity in the stack, with an overhead that keeps increasing (Role-Based Acess Control, Service Discovery, Namespaces etc…).
Now we just have too many configuration files (Dockerfile/Kubernetes YAML) in too many places, which makes tracking the current state of the environement really hard.

Nix/NixOS seems to address each of these issues by:

Unifying the package management through Nix.
Making the production environement a 1:1 reproduction of the developement one.
Keeping the stack at a sane level of management for a small team.
Keeping configuration files at consistent places, and always being able to predict the current state of our production.

The first part of my question is: is our understanding of what could Nix/NixOS bring us in production correct?

The second part of my question is more practical.

Let’s say we have switched our project to be built with Nix, and so we’re developping with it. What would be a sane practice for automated deployment?

From our understanding, if we have running nodes of NixOS instances we can use NixOps to manage those instances. Is it a good idea to execute NixOps from within a CI Task once the build is made to update all nodes automatically?

Thank you very much for any help

domenkozar · November 20, 2019, 10:25pm

Hi @Elyhaka,

The first part of my question is: is our understanding of what could Nix/NixOS bring us in production correct?

Yes - seems quite spot on. You’ll need to still provide some way to do local development, if you’re used to docker-compose you can take a look at GitHub - hercules-ci/arion: Run docker-compose with help from Nix/NixOS

Let’s say we have switched our project to be built with Nix, and so we’re developping with it. What would be a sane practice for automated deployment?

That mostly depends on your workflow, but the first question is what CI you’ll use.

Either way, If you’re going to use nixops for deployments, it’s quite tricky to share the state of it between deployments.

From our understanding, if we have running nodes of NixOS instances we can use NixOps to manage those instances. Is it a good idea to execute NixOps from within a CI Task once the build is made to update all nodes automatically?

My recommendation would be to use terraform for provisioning, as it does that job very well.

We’re planning to add impure tasks to https://hercules-ci.com/, which will allow you to configure such automatic deployments.

If you’re looking to start with a very simple setup, I recommend taking a look at GitHub - cachix/cachix-action: Build software only once and put it in a global cache

Elyhaka · November 21, 2019, 12:11pm

Thank you for your answer, I’m still trying to piece things together.

That sounds really nice! I will dig a bit more. I was looking into using nixos-containers to mirror a production environment. I had the feeling that this option would give me that 1:1 reproductability I’m looking for, is arion able to achieve the same result?

What do you mean by share the state of it? I’m not sure what it would mean? Are you referring to this issue ?
Let’s say we stay on Gitlab for the CI part (for now, of course), I could host my own worker that would simply trigger nixops deployment with a state stored on this worker (which would be backed up with a cron task on a blob storage for instance).

Noted, I will check this option out. I have to admit we wanted to avoid adding more programs to the stack as it tends to create the complexity we try to escape from .

If I understand correctly, it means that those type of tasks could have side-effects like triggering a deployment, right ? If yes, that sounds a great feature .

domenkozar · November 21, 2019, 12:16pm

If all your developers use NixOS that sounds like an easy win, otherwise you’ll have to do something else as nixos-containers only work on NixOS.

NixOps uses an sqlite for state. We have a script that syncs that to S3 on each invocation, but it has no locking. There’s a PR open for quite some time to address this. At the end you’re reimplementing terraform. Not ideal, but something to be aware of you’ll have to deal with.

Exactly

danieldk · November 21, 2019, 12:36pm

I do not have experience with it yet (has been on my to-do list for a while), but Morph is a stateless alternative to nixops:

Elyhaka · November 21, 2019, 1:34pm

If I try to summarize the current options that we have for the automated deployment part (do not hesitate to correct me if I’m wrong) :

Hercules.ci (with Impure Tasks) : Would probably be the best fit, but not ready yet and would require an external service to operate.
NixOps : Would require to manage the state file in a manual way to be sure to not lose it.
Morph : Would solve the state issue of NixOps, but to quote the developers of it:

Morph is by no means done. The CLI UI might (and probably will) change once in a while. The code is written by humans with an itch to scratch, and we’re discussing a complete rewrite (so feel free to complain about the source code since we don’t like it either).

Which is not really reassuring for a production environement .

Also, for all of those solutions, I have the feeling there might be an issue with automatic deployment/deletion given the auto-scaling features of some cloud providers. How can a node be added/removed from a stateless configuration file (like morph) or even with stateful tools like nixops?

I have the feeling that there is no standard way to manage the deployment phase, and everybody is scripting their own way through it. Am I correct? If that’s the case maybe we could develop our own little opiniated solution to do so. I’m thinking about something like a lightweight agent (probably in Rust) that registers the NixOS instances to a master. This master would have the possibility to push new configuration.nix files (or maybe just part of it, like the current revision of running programs) to the new hosts. Would that be a complete anti-pattern to implement such a solution, or would that make sense?

azazel75 · November 23, 2019, 11:41am

Hello,

just to add my two cents as I switched to having a mixed environment docker/k8s or nix/nixos in production too.

I’m using a modified gitlab runner by @arianvp which works well and allows me to reuse the great infrastructure that gitlab provides. For the deployment I don’t do automatic deployments to the production, for which I use NixOps but I experimented first-hand what need to be understood and done in order do small nixos deployments and its good to have a picture of what’s needed, for that see my demo server deployment.

domenkozar · November 26, 2019, 9:58am

I think you’re mixing two requirements:

a way to execute automated deployments (CD): Hercules CI / Buildkite / Jenkins / …
tool for deploying/provisioning: NixOps / Terraform / Morph / …

petabyteboy · November 26, 2019, 1:57pm

Hi,

we are currently in the process of migrating to NixOS for our production systems at nyantec.
When talking to some people at the local hackerspace AfRA I learned about krops, a very minimal tool for stateless remote deployments.
This is what we are using for now, but Morph sounds very interesting too.
One feature of krops is the ability to deploy secrets from a password store to the target machine, which solves one of the remaining problems with Nix deployments in an elegant way.
When designing a NixOS production deployment, always keep in mind that Nix can not manage the state for you. You need to set up backups and secrets management with another software.

- Milan

ryantm · November 26, 2019, 9:08pm

Depending on what you are doing, it is possible to use NixOps in a way that you don’t have to worry about the state file. I wrote a blog post about that a while ago and I’ve been using this approach in production for over a year now.

https://ryantm.com/nixops-without-sharing/

Elyhaka · November 27, 2019, 9:08pm

You’re completly right, I mixed up the two.

I’m going to check this out, the secret management part is something we will need to manage our deployments.

I would like to thank all of you for the incredible amount of input you gave us, which has been really helpful in getting us to understand how we could operate Nix in production