Best NixOS Deployment tool (for my situation)

As shown by awesome-nix NixOS is plagued with a large number of deployment tools.

When I first started converting my home network to NixOS (several years ago), I used deploy-rs, which worked great…until it didn’t. I may be an idiot, but when I had trouble with configurations, particularly any trouble that made server inaccessible, deploy-rs left me frustrated, because although I had to get in to the remote system to figure out what was wrong, I had to fix it on the deployment side.

Since then, I just do everything manually. All the systems share a nix flake in a local git repo, and to update a remote machine, I ssh hostname 'sudo; cd /etc/nixos; git fetch; nixos-rebuild switch'. This way, I can make changes from any machine in the network, but the changes are always editable from the remote machine.

I’m wondering if it’s worth trying another tool, or if I should just stick to what I have.

With that in mind, I guess my creteria are:

  1. Must be able to recover from the target machine.
  2. Don’t make configuration more complicated than it needs to be.

comin looked interesting, but feels a little opinionated, and doesn’t seem to put the git repo in /etc/nixos, so I don’t know if it meets criteria 1. Also, a pull model might be a little too automatic for my updating tastes.

Clan looks very intriguing, but tears me between “this does everything I’m looking for” and “this adds layers of abstraction that overcomplicates an already complicated thing.”

Or perhaps I should give deploy-rs another whirl?

Any feedback is greatly appreciated :slight_smile:

2 Likes

Not exactly what you are looking for, but Dead man switch for nixos-rebuild switch / boot might be helpful

3 Likes

That’s like my experience with deploy-rs! I thought it was great that I could put most deploy details in my flake configs, but also found issues with unrecoverable network errors, and I think some issues with some deployments not “sticking.”

Nixos-rebuild turns out perfectly capable of remote deploys, with the --target-host argument, e.g., --target-host ssh://hostname --use-remote-sudo --use-substitutes. So I have a build machine that builds each host locally, then I do nix copy --to ssh://hostname -s --no-check-sigs <nix-store-path> to prep the target, then nixos-rebuild to deploy.

As @withakay mentioned, you might try my switch-fix util, it’s a way to try to recover from bad network setups – saves the current running system profile, then on boot, reverts back to that system if you don’t get in there and cancel before a delay.

1 Like

I’m curious what advantages you gain here over just ssh-ing into the machine. In my case, the machine I’m almost always interacting with also serves as the local nix cache for other machines, so it generally has more packages installed than any other system.

If you’re editing the config on your local machine, it’s certainly a lot faster to use --target-host than having to manually ssh in and git pull, but if you’re already on the remote system troubleshooting, does it not make more sense to just work on the config on the remote system?

I may be taking things too far, but the way my update script works rn is that it tags the current commit after a successful nixos-rebuild switch so that you always know the difference between each of your generations.

You don’t need a clone of the repo on the machine since you can just do nixos-rebuild switch --flake <repo-url>#<config> --target-host <config>

2 Likes

Some of my deploy targets are tiny enemic vms in terms of processor and ram, might not build a system and certainly not very fast.

Also, I have a mono repo of systems, and in cases where I deploy to systems that have other users, I prefer to keep that repo to myself, even with secrets preserved via sops-nix.

2 Likes

I should say, I do ssh into the build machine, sounds like you do something similar if the one you’re interacting with is a cache for other systems, as a remote builder. It’s actually why I wrote switch-fix in the first place - the build machine is bare metal without kvm (the remote display and input kind, not the hypervisor), so messing up network configs for that machine is pretty costly.

Ah, gotcha. This is the same reason I don’t use cachix or similar tools. I’ve got a lot of agenix encrypted secrets that I don’t want leaving my local network even if they are encrypted.

I’m the only user on my machines, but I totally get why you’d want to hide the system config from non-power users.

Yeah, so it started out as my Jellyfin (Movie) server, but has since taken on a lot more roles, and since it has the longest uptime of any machine on the network, I figured - why not make it a local cache?


I think what I’ve gleaned from this is that what I’m really looking for is the ability to look at two generations and be able to tell the complete difference between them. Tools like nvd and dix are cool, but don’t show you the full picture in the way that tagged commits can, so I guess I’ll stick with my custom deploy script like I’m sure many people do.

Thanks everyone! :slight_smile:

2 Likes

comin is able to do this since it can fetch from several remotes. You could have a configuration such as:

services.comin.remotes = [
    {
      name = "origin";
      url = "https://gitlab.com/your/private-infra.git";
    }
    {
      name = "local";
      url = "/etc/nixos";
      poller.period = 2;
    }
];

For local testing, i recommend to use the comin testing branch feature in order to avoid Git desyncronisation between your machines.

2 Likes