Deployment tools: evaluating NixOps, deploy-rs, and vanilla nix-rebuild

Our family is redoing our home lab to use NixOS. Currently, we SSH into the server and execute nix-rebuild, but we are exploring other deployment options.

Our top contenders are NixOps, deploy-rs, or just using --target-host with nixos-rebuild.

All three options seem pretty straightforward, so we may just try all three and see what we like. At present, my goal is to build locally, then deploy. All target and build architectures are currently NixOS and x86_64, although it is not out of the question that we might throw some ARM targets in the mix at some point.

Have any of you tried 2 or more of these options, and have any reflections to add? What did you love about a particular tool?

3 Likes

nixops is abandonware. It was kind of magical for aws deployements where it could do everything form scratch.
If you have physical access to the machines, nixos-rebuild is the simplest and most straightforward one.
I once locked out myself of a machine with nixos-rebuild because I had disabled the ssh server. Since then I use deploy-rs that automatically fallbacks to previous generations in case of issues but it adds complexity compared to nixos-rebuild, might not be worth it for you.

4 Likes

Thank you, this is very helpful!

Looking at the NixOps repo activity, it wouldn’t occur to me that this project is abandonware, other than the oddly recent transition to Python 3. What reasons cause you to brand it abandonware?

Using nixos-rebuild sounds like a good starting place, then. Much appreciated!

we don’t want to hurt anyone’s feelings by saying that… but very unfortunately it is true

real shame too because to me nixops, conceptually speaking, is the perfect tool for the type of networks i manage


since you said homelab you might consider colmena… it has one nice feature where you can reference configuration from machine a on machine b, which can be really nice if you have a reverse proxy or whatever

1 Like

+1 to colmena. For me the “reference other machines’ configs when generating a machine’s config” feature is a game changer. This is something that nixops did originally, and was carried over (almost identically) in later deployment tools like morph and colmena. There are a few minor things that I wish colmena did better (e.g. remote evaluation, nix-output-monitor support, etc.), but it’s pretty close to a local optimum for me.

A few things I do with this “cross-referencing other machines’ config” feature:

  • Machines that I configure as being remote builders automatically get added to the /etc/nix/machines of all the other machines that need to perform nix builds.
  • All the enabled Prometheus exporters on all my machines automatically get scraped by my Prometheus server without having to write any of the plumbing manually.
  • I auto-generate a wireguard p2p mesh by having each machine look at the wireguard config of other machines in my network’s Nix config.
7 Likes

Excellent. Thanks so much to all of you! I am looking at colmena now; thanks for that tip, @aanderse .

I love your thoughts on referencing build hosts on each machine, @delroth .

I remember deploy-rs not playing well with having to type a password for remote sudo. I know 1-2 other tools I’ve tried also had similar problems. ot sure if they’ve been fixed since then.

if you want to ‘learn’ how you can do this at the cmd line with bash, then try nixinate…

it’s simple and you’ll be able to find out how easy it is to deploy with nix.

1 Like

but in case the machine configurations are in flakes - referencing different configs is not a problem, in this case just doing nixos-rebuild --target-host will do the trick, no?

I’m just personally trying to keep all the “third-party” tools to the minimum

1 Like

this is basically what nixinate does in a nutshell.

You can probably just run it up and reverse engineer the bash script it makes!!!

it designed to be to show you how to do it yourself.

It’s very popular in 2038.

1 Like

This was surprising for me to read, but seems to be true. I’ve been happily using it for years to maintain my remote (some virtual) machines. I’d been meaning to look into colmena but haven’t had an impetus to, given that I haven’t had any issues with NixOps. If plans really are to sunset it, then that might be the impetus I need.

5 Likes

There is also cachix agent, has nice web interface.

But now I’m using system.autoUpgrade with flake.

  system.autoUpgrade.enable = true;
  system.autoUpgrade.dates  = "*-*-* *:20:00";
  system.autoUpgrade.flake  = "github:hugosenari/nixos-config#${config.networking.hostName}";
  system.autoUpgrade.flags  = ["--refresh"];
  system.autoUpgrade.randomizedDelaySec = "5m";

Pools my config (flake) from GitHub every day.

Good: simple
Good: 546ms CPU time, received 9.6K IP traffic, sent 3.4K IP traffic. (every X time)

Bad: have to wait timer trigger.
Bad: Isn’t a tool to setup new instances (like NixOps).

4 Likes

Cachix agent looks very nice; thanks for pointing that out.

Our current use case is a single deployment target that has decent compute resources. So we are finding that our original practice of connecting via SSH may not be so bad. Running nix-rebuild --target-host works, but doesn’t necessarily gain us anything over connecting first (and using tmux, which has advantages) then running nix-rebuild.

Given our scenario, your system.autoUpgrade idea, @hugosenari,is brilliant. Honestly, I didn’t know that option existed!

Gonna run that by the Ops team now… (the 16-year-old in the basement). We’ll see…

2 Likes

FWIW I’ve been using deploy-rs as of late for most of my Pis and it’s been rather painless. I would like to give colmena a try. Would be nice to just use nixos-rebuild --target-host but I haven’t and deploy-rs offers those nice checks to avoid mistakes

2 Likes

Curious: does this do a nix flake update as well? And does it also do a fresh git pull?

1 Like

Basically what it does is constantly asking your system to be reinstalled, pointing to a repository.
So I’m not expecting it to update your inputs. And you have to add --refresh or nix will use flake cache (not a fresh pull)

1 Like

Answering only for how I would use this:

This would be run on hosts I log into less often, or that are used rarely and should be upgraded on boot/resume after a period of time. Previously, I had one or two of those with the auto upgrade service enabled, on channels. I disabled that when everything moved to a system flake.

It’s why I haven’t used any of the various deployment tools (though bento seems interesting), because they’re mostly push style, and this use-case needs something more pull based.

I, too, had missed that there was a flake option for the autoupdate service. For me, the idea here is that I update regularly on my active desktop, and push those revisions including the locked flake inputs to the repo. So those autoupgrades update inputs via the git pull of the lock file, and only update to revisions I have already used and built, rather than updating their lock file locally. This also means the content will already be in the store of another local system.

I might (if I can be bothered) even keep a separate branch for the known-good updates. That’s still less branch maintenance than I used to do before consolidating a common base and per-host branches into a single flake branch.

1 Like

Excellent detail. Thank you for helping me think through this!

I set this up on one of those hosts shortly after posting here, and initial tests were good. But I had one suspicion, which turned out to be valid on testing again this morning:

The default config of the autoUpdate service has a timer with the persistent flag set, so it runs when the laptop is resumed after being suspended for a while (overnight, or for several months…)

But the problem I found this morning:

Dec 14 09:05:13 rocinante systemd[1]: Starting NixOS Upgrade...
Dec 14 09:05:13 rocinante nixos-upgrade-start[32301]: warning: you don't have Internet access; disabling some network-dependent features
Dec 14 09:05:13 rocinante nixos-upgrade-start[32298]: building the system configuration...
Dec 14 09:05:13 rocinante nixos-upgrade-start[32355]: warning: you don't have Internet access; disabling some network-dependent features
…

It runs too soon after resume, before the wifi has had a chance to reconnect.

So it needs and additional dependency on network-online.target probably, and/or a delay. Edit: that dependency is already there, but doesn’t seem to apply after resume, I’ll look further into this.

1 Like

I use this snippet for fwupd-refresh, but it could be easily used for nixos-upgrade. It just allows the service to restart several times over several minutes without failing…which typically should allow the network time to be up on a laptop.

  # Firmware updates - fwupd
  services.fwupd.enable = true;
  # Allow fwupd-refresh to restart if failed (after resume)
  systemd.services.fwupd-refresh = {
    serviceConfig = {
      Restart = "on-failure";
      RestartSec = "20";
    };
    unitConfig = {
      StartLimitIntervalSec = 100;
      StartLimitBurst = 5;
    };
  };