Seeking help to understand NixOps use cases

koolean · January 13, 2020, 10:51am

I apologize in advance if my question sounds stupid. I would like to understand if NixOps can help in my use case.

In my company, we have 100+ customers and for each of them, we would like to deploy a physical server. Each server will serve a web application for control and reporting of a manufacturing process. Each server will be physically located at the corresponding customer’s facility, not in the cloud.

So I will have to manage a set of 100+ physical machines that do the same thing. However, these machines will be independent from each other, they are for different customers are not designed to communicate between each other.

Reading through the NixOps manual and playing a bit with the tool, I have the feeling NixOps is mostly designed for networks of several machines where each machine accomplishes a specific task for a broader common goal. The documentation part about deploying to physical NixOS machine being so small, I also have the feeling the tool is designed for the cloud.

So, I would like to know if NixOps can be helpful in my scenario and how.

peterhoeg · January 13, 2020, 12:15pm

We are not quite at 100+ customers yet (working on it!) but we are using nixops the way you describe.

We have a deployment per customer (and some wrappers that take care of setting the state properly, export/import to .json). As most of the servers are similar (or serve a similar purpose), the configuration is shared with a bunch of helper functions that configure things per customer.

It works really well. Feel free to ask more questions.

joehealy · January 13, 2020, 12:56pm

We also use nixops in a similar manner. We have both physical and virtual machines at customer (and our own) sites. These are typically in very remote locations and the ability for someone unskilled to reboot and rollback has been invaluable on a number of occasions. We have experimented with automatic rollbacks in the event of network failures, but so far still prefer to have a human in the loop.

We manage each customer with their own repository and buildkite (https://buildkite.com) pipeline to run the nixops tool to build and push out configuration changes. Typically we break the pipeline into a number of steps (build, push to test machine, push to production machine), with a breakpoint (ie human intervention) between each critical step.

We could do this without nixops (ie just with nix/nixos) like at Haskell for all: NixOS in production, but have found nixops provides plenty of benefits that it is worth using.

One of the big benefits is it gives our support engineers the same interface/process for updates no matter whether the machine is physical, hosted in the clients virtualisation infrastructure or a cloud based system.

In terms of deploying remotely, when we have had a remote client and spares were some time away, we were able to send them a usb image which was preconfigured to connect back to our infrastructure, then they burnt and inserted the usb stick in some other spare hardware. Once it booted and connected to us via openvpn, we were able to use nixops to put the latest config/installation of the previous machine on immediately.

Would definitely recommended nix/nixops.

koolean · January 13, 2020, 2:08pm

Good to know that I am on the right track.

How do you do the very first NixOS install on the target machines? Do you automate this process? If yes, is the disk partitioning also part of the automated process? The idea for me is to avoid repetitive and error-prone manual install.

koolean · January 13, 2020, 2:24pm

Thank you for your feedback. It is nice to know that several people recommend NixOps in my use case.

As asked to @peterhoeg, I would also be interested to know how you handle the initial NixOS install on the physical machines. I am also in a scenario where I need to work remotely with few or no skilled human on the remote side so I am trying to find the most effective solution for the initial install.

peterhoeg · January 13, 2020, 4:14pm

[quote=“koolean, post:4, topic:5468”]
How do you do the very first NixOS install on the target machines? Do you automate this process? If yes, is the disk partitioning also part of the automated process? [/quote]

That part isn’t super advanced - a script to handle a base install (pure vanilla but will run on all VM platforms) into a VM which is then exported as an image that can then be imported to the hypervisor of choice.

That part is not great but not really a big deal as it’s only done once per hypervisor type and the manual import is truly a one-off thing.

joehealy · January 13, 2020, 9:57pm

We use an updated version of this repo: GitHub - GeoscienceAustralia/NixOS-Machines to build initial images for customers to install in the vmware/hyperv/xenserver environments.

For local physical devices, we so far just do a manual install as it is a relatively rare event and easy to do. If we had to do more than a 3-4 at a time, we would write a script that did partitioning and installation and add it to a custom usb installer.

For remote physical devices (with someone capable of booting off usb at remote end), we send a custom usb image as described above.

koolean · January 16, 2020, 1:14pm

So, you mean that a machine without NixOS installed on it but simply running NixOS, is enough to be used as a target host by NixOps. Is that correct?

joehealy · May 12, 2020, 6:13am

Sorry for the delay in replying. Any machine running Nixos can be used as a target, however it is best if it will boot into it permanently. So we send them a bootable usb, they plug it in, it connects to our vpn - which lets us log in and install nixos on the hardware. The person at the remote end then unplugs the usb, performs a couple of reboot tests and we then deploy with nixops over the top.