Deployment tools: evaluating NixOps, deploy-rs, and vanilla nix-rebuild

but in case the machine configurations are in flakes - referencing different configs is not a problem, in this case just doing nixos-rebuild --target-host will do the trick, no?

I’m just personally trying to keep all the “third-party” tools to the minimum

1 Like

this is basically what nixinate does in a nutshell.

You can probably just run it up and reverse engineer the bash script it makes!!!

it designed to be to show you how to do it yourself.

It’s very popular in 2038.

2 Likes

This was surprising for me to read, but seems to be true. I’ve been happily using it for years to maintain my remote (some virtual) machines. I’d been meaning to look into colmena but haven’t had an impetus to, given that I haven’t had any issues with NixOps. If plans really are to sunset it, then that might be the impetus I need.

6 Likes

There is also cachix agent, has nice web interface.

But now I’m using system.autoUpgrade with flake.

  system.autoUpgrade.enable = true;
  system.autoUpgrade.dates  = "*-*-* *:20:00";
  system.autoUpgrade.flake  = "github:hugosenari/nixos-config#${config.networking.hostName}";
  system.autoUpgrade.flags  = ["--refresh"];
  system.autoUpgrade.randomizedDelaySec = "5m";

Pools my config (flake) from GitHub every day.

Good: simple
Good: 546ms CPU time, received 9.6K IP traffic, sent 3.4K IP traffic. (every X time)

Bad: have to wait timer trigger.
Bad: Isn’t a tool to setup new instances (like NixOps).

6 Likes

Cachix agent looks very nice; thanks for pointing that out.

Our current use case is a single deployment target that has decent compute resources. So we are finding that our original practice of connecting via SSH may not be so bad. Running nix-rebuild --target-host works, but doesn’t necessarily gain us anything over connecting first (and using tmux, which has advantages) then running nix-rebuild.

Given our scenario, your system.autoUpgrade idea, @hugosenari,is brilliant. Honestly, I didn’t know that option existed!

Gonna run that by the Ops team now… (the 16-year-old in the basement). We’ll see…

3 Likes

FWIW I’ve been using deploy-rs as of late for most of my Pis and it’s been rather painless. I would like to give colmena a try. Would be nice to just use nixos-rebuild --target-host but I haven’t and deploy-rs offers those nice checks to avoid mistakes

2 Likes

Curious: does this do a nix flake update as well? And does it also do a fresh git pull?

1 Like

Basically what it does is constantly asking your system to be reinstalled, pointing to a repository.
So I’m not expecting it to update your inputs. And you have to add --refresh or nix will use flake cache (not a fresh pull)

1 Like

Answering only for how I would use this:

This would be run on hosts I log into less often, or that are used rarely and should be upgraded on boot/resume after a period of time. Previously, I had one or two of those with the auto upgrade service enabled, on channels. I disabled that when everything moved to a system flake.

It’s why I haven’t used any of the various deployment tools (though bento seems interesting), because they’re mostly push style, and this use-case needs something more pull based.

I, too, had missed that there was a flake option for the autoupdate service. For me, the idea here is that I update regularly on my active desktop, and push those revisions including the locked flake inputs to the repo. So those autoupgrades update inputs via the git pull of the lock file, and only update to revisions I have already used and built, rather than updating their lock file locally. This also means the content will already be in the store of another local system.

I might (if I can be bothered) even keep a separate branch for the known-good updates. That’s still less branch maintenance than I used to do before consolidating a common base and per-host branches into a single flake branch.

1 Like

Excellent detail. Thank you for helping me think through this!

I set this up on one of those hosts shortly after posting here, and initial tests were good. But I had one suspicion, which turned out to be valid on testing again this morning:

The default config of the autoUpdate service has a timer with the persistent flag set, so it runs when the laptop is resumed after being suspended for a while (overnight, or for several months…)

But the problem I found this morning:

Dec 14 09:05:13 rocinante systemd[1]: Starting NixOS Upgrade...
Dec 14 09:05:13 rocinante nixos-upgrade-start[32301]: warning: you don't have Internet access; disabling some network-dependent features
Dec 14 09:05:13 rocinante nixos-upgrade-start[32298]: building the system configuration...
Dec 14 09:05:13 rocinante nixos-upgrade-start[32355]: warning: you don't have Internet access; disabling some network-dependent features
…

It runs too soon after resume, before the wifi has had a chance to reconnect.

So it needs and additional dependency on network-online.target probably, and/or a delay. Edit: that dependency is already there, but doesn’t seem to apply after resume, I’ll look further into this.

1 Like

I use this snippet for fwupd-refresh, but it could be easily used for nixos-upgrade. It just allows the service to restart several times over several minutes without failing…which typically should allow the network time to be up on a laptop.

  # Firmware updates - fwupd
  services.fwupd.enable = true;
  # Allow fwupd-refresh to restart if failed (after resume)
  systemd.services.fwupd-refresh = {
    serviceConfig = {
      Restart = "on-failure";
      RestartSec = "20";
    };
    unitConfig = {
      StartLimitIntervalSec = 100;
      StartLimitBurst = 5;
    };
  };

Yeah, in this case the service doesn’t actually fail, and furthermore it seems to actually do a switch even when there are no changes… so a bit more work might be needed generally.

Edit: it’s worse than that… even with --refresh, nixos-rebuild runs, fails to fetch a new revision from git, and then builds and switches to the old revision. Which can result in a rollback if the current system was built from a newer revision in /etc/nixos than what it has cached from the last time the autoupgrade ran.

https://github.com/NixOS/nixpkgs/issues/274146

I masked the problem for now with a preStart command that checks ssh connectivity to the git server; when that fails it then follows these Restart settings.

That’s the right solution for waiting for the wifi to connect, but not for the ignored errors during the rebuild run itself.

1 Like

If this can help, I made a video about NixOS deployment tools

7 Likes

Currently I use deploy-rs for push based operations. It plays nice with a vanilla nix flake configs and supports home manager without a backing nixos home-manager module. The login/sudo prompt is a problem. I currently just have a dedicated ssh key I use for deploying new systems but I personally don’t like adding more entrypoints to my systems. I’m looking into using system.autoUpgrade for my desktop computers as the pull based option.
I think missing right now is choice for pull based systems, Bento looks great but I have my reasons for not wanting to use it. nixos-rebuild is also pull based, but has its own issues, in particular it still performs evaluation on the client instead of build-host, which would kill my rpi. One day I might write Yet Another NixOs Deployer, but for now I just want to make sure I can push to machines and don’t miss security patches.

1 Like

I’m also having issues with the sudo prompt… Wish there was a tool which got this right (I’ve heard nixinate handles it nicely, but last I tried I was running into different issues)

1 Like

Thanks for the info I made a little tweak for my use and a systemd unit(and timer were) generated.

{ config, ... }:
{
    system.autoUpgrade.enable = true;
    system.autoUpgrade.dates  = "Fri *-*-1..7,15..21 01:00:00";
    system.autoUpgrade.flake  = "github:${config.userDefinedGlobalVariables.githubFlakeRepositoryName}#${config.userDefinedGlobalVariables.hostTag}";
    system.autoUpgrade.randomizedDelaySec = "5m";

}


From the documentation for system.autoUpgrade.enable " Whether to periodically upgrade NixOS to the latest version. If enabled, a systemd timer will run nixos-rebuild switch --upgrade once a day."

looking at the file that systemd will execute I can see a single line

nix/store/as1snmyxhr9633n30pbcy2fcbbggii4p-nixos-rebuild/bin/nixos-rebuild switch --flake github:p3t33/nixos_flake#homelab --upgrade

The --upgrade have nothing to do with a flake(and is used with channels) which is updated to the best of my knowledge with

nix flake update

which means that the command has some junk in it.

I also not sure why would you need --refresh when using a flake which is responsible to make sure to keep the state of your packages in check. If I remember correctly --refresh is a channel option.

Flake has cache of repositories, --refresh is used to bypass this.
But that can be wrong :sweat_smile:

Maybe since you will run only on Friday, flake git cache may be invalidated as “too old”.
In my case, I settled it to once every 20min, may be a problem.

https://github.com/NixOS/nix/issues/4007#issuecomment-692025931

This sounds great. May I ask if there is a good place to start with an example you have shared or would recommend?