Deployment tools: evaluating NixOps, deploy-rs, and vanilla nix-rebuild

There is also cachix agent, has nice web interface.

But now I’m using system.autoUpgrade with flake.

  system.autoUpgrade.enable = true;
  system.autoUpgrade.dates  = "*-*-* *:20:00";
  system.autoUpgrade.flake  = "github:hugosenari/nixos-config#${config.networking.hostName}";
  system.autoUpgrade.flags  = ["--refresh"];
  system.autoUpgrade.randomizedDelaySec = "5m";

Pools my config (flake) from GitHub every day.

Good: simple
Good: 546ms CPU time, received 9.6K IP traffic, sent 3.4K IP traffic. (every X time)

Bad: have to wait timer trigger.
Bad: Isn’t a tool to setup new instances (like NixOps).

4 Likes

Cachix agent looks very nice; thanks for pointing that out.

Our current use case is a single deployment target that has decent compute resources. So we are finding that our original practice of connecting via SSH may not be so bad. Running nix-rebuild --target-host works, but doesn’t necessarily gain us anything over connecting first (and using tmux, which has advantages) then running nix-rebuild.

Given our scenario, your system.autoUpgrade idea, @hugosenari,is brilliant. Honestly, I didn’t know that option existed!

Gonna run that by the Ops team now… (the 16-year-old in the basement). We’ll see…

3 Likes

FWIW I’ve been using deploy-rs as of late for most of my Pis and it’s been rather painless. I would like to give colmena a try. Would be nice to just use nixos-rebuild --target-host but I haven’t and deploy-rs offers those nice checks to avoid mistakes

2 Likes

Curious: does this do a nix flake update as well? And does it also do a fresh git pull?

1 Like

Basically what it does is constantly asking your system to be reinstalled, pointing to a repository.
So I’m not expecting it to update your inputs. And you have to add --refresh or nix will use flake cache (not a fresh pull)

1 Like

Answering only for how I would use this:

This would be run on hosts I log into less often, or that are used rarely and should be upgraded on boot/resume after a period of time. Previously, I had one or two of those with the auto upgrade service enabled, on channels. I disabled that when everything moved to a system flake.

It’s why I haven’t used any of the various deployment tools (though bento seems interesting), because they’re mostly push style, and this use-case needs something more pull based.

I, too, had missed that there was a flake option for the autoupdate service. For me, the idea here is that I update regularly on my active desktop, and push those revisions including the locked flake inputs to the repo. So those autoupgrades update inputs via the git pull of the lock file, and only update to revisions I have already used and built, rather than updating their lock file locally. This also means the content will already be in the store of another local system.

I might (if I can be bothered) even keep a separate branch for the known-good updates. That’s still less branch maintenance than I used to do before consolidating a common base and per-host branches into a single flake branch.

1 Like

Excellent detail. Thank you for helping me think through this!

I set this up on one of those hosts shortly after posting here, and initial tests were good. But I had one suspicion, which turned out to be valid on testing again this morning:

The default config of the autoUpdate service has a timer with the persistent flag set, so it runs when the laptop is resumed after being suspended for a while (overnight, or for several months…)

But the problem I found this morning:

Dec 14 09:05:13 rocinante systemd[1]: Starting NixOS Upgrade...
Dec 14 09:05:13 rocinante nixos-upgrade-start[32301]: warning: you don't have Internet access; disabling some network-dependent features
Dec 14 09:05:13 rocinante nixos-upgrade-start[32298]: building the system configuration...
Dec 14 09:05:13 rocinante nixos-upgrade-start[32355]: warning: you don't have Internet access; disabling some network-dependent features
…

It runs too soon after resume, before the wifi has had a chance to reconnect.

So it needs and additional dependency on network-online.target probably, and/or a delay. Edit: that dependency is already there, but doesn’t seem to apply after resume, I’ll look further into this.

1 Like

I use this snippet for fwupd-refresh, but it could be easily used for nixos-upgrade. It just allows the service to restart several times over several minutes without failing…which typically should allow the network time to be up on a laptop.

  # Firmware updates - fwupd
  services.fwupd.enable = true;
  # Allow fwupd-refresh to restart if failed (after resume)
  systemd.services.fwupd-refresh = {
    serviceConfig = {
      Restart = "on-failure";
      RestartSec = "20";
    };
    unitConfig = {
      StartLimitIntervalSec = 100;
      StartLimitBurst = 5;
    };
  };

Yeah, in this case the service doesn’t actually fail, and furthermore it seems to actually do a switch even when there are no changes… so a bit more work might be needed generally.

Edit: it’s worse than that… even with --refresh, nixos-rebuild runs, fails to fetch a new revision from git, and then builds and switches to the old revision. Which can result in a rollback if the current system was built from a newer revision in /etc/nixos than what it has cached from the last time the autoupgrade ran.

https://github.com/NixOS/nixpkgs/issues/274146

I masked the problem for now with a preStart command that checks ssh connectivity to the git server; when that fails it then follows these Restart settings.

That’s the right solution for waiting for the wifi to connect, but not for the ignored errors during the rebuild run itself.

1 Like

If this can help, I made a video about NixOS deployment tools

7 Likes

Currently I use deploy-rs for push based operations. It plays nice with a vanilla nix flake configs and supports home manager without a backing nixos home-manager module. The login/sudo prompt is a problem. I currently just have a dedicated ssh key I use for deploying new systems but I personally don’t like adding more entrypoints to my systems. I’m looking into using system.autoUpgrade for my desktop computers as the pull based option.
I think missing right now is choice for pull based systems, Bento looks great but I have my reasons for not wanting to use it. nixos-rebuild is also pull based, but has its own issues, in particular it still performs evaluation on the client instead of build-host, which would kill my rpi. One day I might write Yet Another NixOs Deployer, but for now I just want to make sure I can push to machines and don’t miss security patches.

1 Like

I’m also having issues with the sudo prompt… Wish there was a tool which got this right (I’ve heard nixinate handles it nicely, but last I tried I was running into different issues)

1 Like

Thanks for the info I made a little tweak for my use and a systemd unit(and timer were) generated.

{ config, ... }:
{
    system.autoUpgrade.enable = true;
    system.autoUpgrade.dates  = "Fri *-*-1..7,15..21 01:00:00";
    system.autoUpgrade.flake  = "github:${config.userDefinedGlobalVariables.githubFlakeRepositoryName}#${config.userDefinedGlobalVariables.hostTag}";
    system.autoUpgrade.randomizedDelaySec = "5m";

}


From the documentation for system.autoUpgrade.enable " Whether to periodically upgrade NixOS to the latest version. If enabled, a systemd timer will run nixos-rebuild switch --upgrade once a day."

looking at the file that systemd will execute I can see a single line

nix/store/as1snmyxhr9633n30pbcy2fcbbggii4p-nixos-rebuild/bin/nixos-rebuild switch --flake github:p3t33/nixos_flake#homelab --upgrade

The --upgrade have nothing to do with a flake(and is used with channels) which is updated to the best of my knowledge with

nix flake update

which means that the command has some junk in it.

I also not sure why would you need --refresh when using a flake which is responsible to make sure to keep the state of your packages in check. If I remember correctly --refresh is a channel option.

Flake has cache of repositories, --refresh is used to bypass this.
But that can be wrong :sweat_smile:

Maybe since you will run only on Friday, flake git cache may be invalidated as “too old”.
In my case, I settled it to once every 20min, may be a problem.

https://github.com/NixOS/nix/issues/4007#issuecomment-692025931

This sounds great. May I ask if there is a good place to start with an example you have shared or would recommend?

I am also using “nixos-rebuild --target-host”

The issue is in order to apply the system config, you either have to log in as root, or use remote sudo. However there is no support for supplying a password when using remote sudo, so then you need passwordless sudo on the target. Not ideal. Looking for a solution that supports better authentication.

1 Like

Agreed. How do you feel about public key only ssh authentication as root on the target? That is one way to get around the sudo problem. But then you are logging in as root… But then I guess that is what sudo does anyway… Curious what you think.

I use nixos-rebuild --target-host and disable all authentication methods except for pubkey on all my machines, and log in as root on the remote host.

This page on the wiki mentions that you can ser NIX_SSHOPTS="-o RequestTTY=force" to use a password when using remote sudo, but I’m not sure how up to date that information is.