Advice: which lean nixos-server devops tool to use?

ppenguin · December 6, 2022, 4:12pm

I’ve been eyeing/reading a bit about nixops, morph, krops, bento, but it’s really going down a rabbit hole fast, since everything I read is pretty opinionated…

I know, I’m inviting just more opinions here, but I thought I might try this to save some evaluation time

One central feature I’m looking for is simplicity, i.e. I want to avoid (complicating) features I don’t need, and the focus of my deployments is:

typical target (server)
- mostly headless (but nice-to-have: manage “kiosk” type terminals on e.g. RPI)
- contrained (diskspace / processing power)
- possibly non-x86-64
- runs (or will run) nixos
- type is any of
  - VPS
  - bare metal (e.g. RPI etc.)
  - (optional: (non-nixos) virtd VMs running on nixos host <= now manual or “ansible-assisted-vagrant”)

Due to the constrained property, an important requirement is that I can build config realisations (or for some even system images?) on my dev machine and deploy them, thereby avoiding that build-time dependencies end up on the target or that builds have to be executed on constrained cpus/mem.
(I believe this is also the main operation mode of nixops).

In general, all targets are trusted, and my configs are in private git, so I’m not immediately concerned about (deployment) secrets but want to evolve latest medium term to “good practice” in this respect as well.

Which of the above solutions would you advise for this use case

P.S.: for “native” nixos systems (that “do their own (re)builds”) I’m already using a multi-system-multi-user flake (including HM), and on existing VPS I’m (still) using /etc/nixos/... configs.

Solene · December 6, 2022, 4:19pm

You could directly use nixos-rebuild to push the configuration to the remote host with --target-host, most tool will use nixos-rebuild to create the system closure, and then use ssh-copy-closure from your host to the remote system to distribute missing packages to fit the closure, and then switch to it.

Then, the tools can provide extra fluff like checking connectivity after the upgrade to automatically revert in case of loss of connectivity.

I wrote bento because my need were really different as most of my computers aren’t running 24/7, but this is clearly written for my specific use case, and I wonder if it’s suitable for other

ppenguin · December 6, 2022, 4:45pm

Ah, for that nixos-rebuild --target-host case I could actually keep using the “one flake rules all” approach, though I’d need to make sure to select the proper output from my wrapper script.
(Probably a limited benefit, since the only thing that might be common to the two “infrastructures” are some user definitions.)

Are there any experiences how fool-proof cross-building of entire system configs for different target archs is?

Also, I forgot to mention, it’s probably nice to optionally have more depth in the deployment config, e.g. like partitioning for VPS deployments. While I don’t have Hetzner but netcup (so no directly supported “control panel API”), e.g. nixops would allow me to mount a live-cd and setup ssh in the live session, and from there continue the installation (including partitioning) per nixops, correct?

Solene · December 6, 2022, 6:41pm

Cross-building should work, but you would have better performance if you use the according architecture on the building machine.

There is a tool to create partitions in a declarative way, if this can be useful to you: GitHub - nix-community/disko: Declarative disk partitioning and formatting using nix [maintainer=@Lassulus]

ppenguin · December 6, 2022, 10:03pm

I watched your video on YT regarding this topic, very informative

I kind of got the impression that nixops might be not such a bad choice afterall. While it certainly cannot be categorised as lean (as per my stated requirement above), it does seem to tick all the other boxes and then some. It probably helps that nixops is a (relatively) well supported “standard” (even though I’m not sure about documentation yet). AFAICT it has partition support built-in, which can be used optionally if I understood correctly.

Your video also made me think twice about wanting to primarily build system closures locally, I’ll probably need a hybrid approach since indeed pushing several GB for each rebuild over ADSL might not be a good idea. I’ll probably end up with having VPS rebuild themselves, and RPI/kiosk/etc in my LAN.
I’ll have to check whether nixops also supports the former mode.

matthewcroughan · December 7, 2022, 6:07am

My tool GitHub - MatthewCroughan/nixinate: Another NixOS Deployment Tool - Nixinate your systems 🕶️ is nothing more than a declarative wrapper for nixos-rebuild that allows you to define the nixosConfigurations you want to deploy, and allows them to be deployed with nix run .#apps.nixinate.machine (which ultimately calls nixos-rebuild with the declared arguments).

ppenguin · December 8, 2022, 12:50am

@matthewcroughan

Ah, it was you all along I just rewatched your VisionFive Embedded NixOS YT video which I had had in the back of my head when thinking about the “deploy image” (e.g. for RPI/kiosk terminals) scenario. You sure made it look easier than it probably is…

So if I understand correctly with nixinate I would deploy a “normal” nixos config for the target via a local (dev machine) build and push the closure and/or build an (e.g. SD card) image. I’d normally have to execute that build step (using a nixinate flake as a tool for this imperative action), but presumably could call nixinate functions also from another flake that does such steps declaratively.

But in the latter case, wouldn’t that be similar to using e.g. nixops? Or are there things one does but the other not? Or could I use them in conjuction?

matthewcroughan · December 8, 2022, 3:29am

With Nixinate, you can choose whether you want to build a given nixosConfiguration entirely on the remote machine itself, or on your local (laptop for example) and then push that to the remote. There are some bugs with nixos-rebuild that prevent this from working that nicely though. So I’d recommend just using remote instead of local with Nixinate for that task, though I do have a fix planned.

ppenguin · December 8, 2022, 9:01am

I just thought of something: shouldn’t “vanilla” usage of nixos-rebuild + nix-copy-closure when used to locally (cross-)build closures for remote systems work fine (and efficiently) incrementally for small changes because of not copying remotely existing paths as mentioned in the docs? That would make per-default local builds of remote systems less prohibitive for ADSL-connected “controllers”?

Though I guess due to the dynamic nature of nixpkgs we’d still be looking at significant traffic…

Solene · December 8, 2022, 9:13am

yes, this is working really fine, except if you have to rebuild a kernel and push its 150 MB of modules + 30 MB of initrd

The copy-closure will just copy the delta.
What could be interesting is running nixos-rebuild on all your devices, but using your main computer as a remote builder, so if anything need to be built on your systems, it will be done on your main computer remotely.

ppenguin · December 13, 2022, 11:27am

@Solene for now indeed I went with disko, which is pretty cool, though a bit convoluted. But I might be doing some things wrong there as well.

Since that bare metal server deployment I was working on (basically as a pilot for my future server management methods) is actually best deployed from a live session (I rely on lsblk and nixos-generate-config because the hardware is not “well-known” enough), I figured a good way to continue was to make the whole setup flake-based. In that way I can have everything in git from the start (and not have /etc/nixos/ lying around and benefit from synergies (e.g. all servers in one flake).

So roughly the install goes like this then:

live session on target: gather system info (lsblk and nixos-generate-config)
make flake with system info and “temp” output just for disko-create script
create partitions from realised disko script
mount minimal partitions (root and boot) to /mnt
nixos-install --root /mnt --flake "gitblabla:/mygit/myflake#myserver"

Given that we are now here, for further system management/maintenance probably nixinate (@matthewcroughan) might indeed be nice to use in conjunction, since I understood that I can basically use the existing flake with it, thereby

avoiding the steps to ssh into the server and calling nixos-rebuild switch --flake <remote-repo>
if so wanted build locally and transfer realised system to remote (this server is powerful enough, so to be tested later)

@Solene @matthewcroughan Thanks both for your useful inputs!

ppenguin · December 13, 2022, 3:07pm

Just noticed I forgot to confirm one thing:

Does nixos-rebuild actually distinguish between build-time and runtime deps, and if so, is this picked up by

nix-collect-garbage
ssh-copy-closure
nixos-rebuild --target

in a way that minimises the final size of the system closure (storage and transfer) to contain only runtime deps? Or is there a diference between them?

Solene · December 13, 2022, 5:29pm

if you need to recompile a kernel for example, it will be done locally and ssh-copy-closure will only copy the result.

fernsehmuell · December 13, 2022, 8:54pm

Take a look at „colmena“ we use it to deploy >100 machines. It can do things in parallel for you: GitHub - zhaofengli/colmena: A simple, stateless NixOS deployment tool

Nebucatnetzer · December 14, 2022, 6:09am

For a few machines you can easily do this with one line of nixos-rebuild and a bit of bash around it.
This script updates a single machine:
https://git.2li.ch/Nebucatnetzer/nixos/src/branch/master/scripts/update_single_machine.sh

And this loops over all my systems in the flake:
https://git.2li.ch/Nebucatnetzer/nixos/src/branch/master/scripts/remote_switch.sh

My notebook is x86_64-linux and builds the config for itself and for all my Pis running aarch64-linux.

Edit: Encryption is done with agenix.

ppenguin · December 14, 2022, 10:42am

Thanks. Yeah, that would be basically the leanest and most transparent way when wanting to use a multi-system flake in a cleanly controlled (i.e. understandable) manner.

ppenguin · December 14, 2022, 10:48am

@fernsehmuell Thanks, I took a quick look at it already, but since I needed to first provision my server I quickly ended up with a flake. So I’d need to check whether I can easily pull the host definitions from the flake (or go all in and replace the flake with colmena of course). It looks good from a usability perspective.

Nebucatnetzer · December 14, 2022, 11:38am

That’s currently my main motivation for doing it this way.
There’s already so much too learn I can’t just add another tool on top.

Advice: which **lean** nixos-server devops tool to use?

Advice: which lean nixos-server devops tool to use?