Morph: nix-based deployment tool

adamt · October 27, 2018, 10:39am

Morph is a tool for updating NixOS hosts. It doesn’t know anything about provisioning servers, it doesn’t understand cloud, and it certainly isn’t perfect.

It is, however, fairly straight-forward to use, doesn’t have any state, and has a very NixOps-inspired format for defining hosts. It supports defining HTTP and command-based health checks, to ensure server n is up before continuing updating server n+1. It also supports basic secret management, implemented by scp’ing files to remote servers, to keep them away from the nix store.

We have been developing this incarnation of the tool for seven months, and been using it daily for more than six months, for managing our growing fleet of about 45 servers.

Maybe a bit of background is in order: When we started using NixOS about a year ago, we looked into existing tools for managing a fleet of NixOS servers. We didn’t want to deal with state in a database, and ended up writing our own deployment tool, morph, to scratch our own itch.
All our servers are hosted in our own data centers, and provisioning is done by loading an unattended NixOS installer; this results in a “blank” NixOS host with nothing more than sshd running, which can then be managed by morph. It’s a bit more involved than that, but that’s the gist of it anyways. This also mean that we don’t have any plans for adding provisioning support directly into morph, and is working on a separate tool for that part.

Without further ado: GitHub - DBCDK/morph: NixOS deployment tool

Best regards,
adamt

luke-clifton · November 3, 2018, 5:41am

This is pretty cool. It’s similar (but better than) what I do.

My solution is built on nixos-rebuild which can deploy to remote machines using the --target-host flag.

I have a repository with a bunch of different configuration.nix files named after the host they are supposed to be deployed to, and a script that essentially loops through the list calling NIXOS_CONFIG="$host.nix" nixos-rebuild $command --target-host $host. I do have some hacked on health checks mechanism, and can specify particular hosts to build rather than all of them. And it pins them all to the same channel. I use git-crypt for secrets, and the script copies them across.

It includes a bunch of utility functions like scraping hosts hardware.nix, upgrading the pinned channel and so-on.

The good part of nixos-rebuild is that it is well documented, and you get rollback/test/switch for free.

My primary gripe with using nixos-rebuild is that the configuration of all the hosts is very independent. I’d like to create modules that understand the network as a whole. Like, automatically add wireguard peers for all the machines in the network to the VPN server, adding extraHosts entries for all the machines, etc. Would something like that be possible with morph?

aanderse · November 3, 2018, 11:25am

What issues did you have with NixOps?

adamt · November 3, 2018, 12:02pm

Hi Luke,

Currently I think the answer is that no, morph cannot do anything significantly smarter that can’t be achieved with your own solution (except for having multiple hosts in the same file).
But with your own solution and morph it’s possible to define common things in a separate file, and then import that file for all relevant hosts. That’s what we do for many, many, many options. I’m sorry we can’t share our current deployment repository, but it’s all riddled with secrets, and we didn’t find a solution we liked to deal with that yet.

However, this is going to change, since ordering for our deployments are starting to matter (e.g. updating static ip leases on servers x before deploying servers y, upgradering kubernetes control planes before the kubelets, …), so I guess we’ll have to add an additional layer to define cluster of machines and ordering, somehow.
A side effect of this is probably going to be defintion of named groups (probably with their own variables), so that it would also be possible for each host do have a rules that depend on other groups of servers, which wouild probably solve your problem as well.

Thanks for your interest, and ff you have any suggestions, feel free to open a github issue with your ideas.

/ adamt

adamt · November 3, 2018, 12:08pm

Hi aanderse,

Our original gripe was the sharing of state. I know this is starting to change with the pluggable backends, but I doubt this is going to make us drop morph anytime soon, since i believe the team is really happy with our currrent solution.

Can NixOps do healthchecks on changed systems? The only thing I found when I looked last week was something related to specific providers (GCE, Azure, i believe), utilizing provider specific API’s.

/ adamt

aanderse · November 3, 2018, 12:36pm

Yeah the shared state is far from desirable but so far that hasn’t outweighed the benefits that my team has perceived that NixOps could give us.

I’ve been using NixOS at home for a year+ or so and recently brought it to my team (of 3) at work as a solution for managing a growing fleet of 150+ servers. Since neither of my teammates knew anything about NixOS we’ve started really simple where we basically copy+pasted the configuration.nix and hardware-configuration.nix from each server onto a master server and then deploy from there. As comfort has grown in the team I’ve started a few abstractions via functions or imports, but the project is still in the very early stages. We’ve only deployed 1 server to production, though we have about ~10 or so on our testing network.

We currently use a shared account on the NixOps server which isn’t horrible so far, but definitely leaves room for improvement. I believe I read about someone who had a NixOps setup where all configuration files were group writable for all NixOps admins, and then calling the actual nixops command as the proper user was achieved via sudo. This definitely sounds better, though obviously still not ideal.

I don’t worry about state too much because the NixOps server runs on a vm itself which is backed up nightly. Before we run any production updates we do vm snapshots on both the NixOps server and the server we are backing up. Considering we only have 1 production server so far this is pretty easy, but as we migrate more production servers to NixOps you can see how these snapshots will become tedious (though no more tedious than the current situation we’re in, which requires this as well). I’d really like to familiarize myself with the NixOps code base and try my hand at a VMWare backend as the ability to call nixops snapshot would save us loads of time when it comes to patching.

I like what you’ve done with Morph and it looks really cool, but I had to go with NixOps instead of a roll-your-own solution to get buy in from my teammates as they have had zero NixOS experience prior to this project.

I’d love to hear other peoples experiences with NixOps or roll-your-own solutions they’ve come up with and what was a challenge, what went well, etc…

luke-clifton · November 4, 2018, 2:33am

Indeed, this is what I currently do. But it involves configuring a bunch of stuff in separate files that I’d rather keep in the main configuration.nix. For the wireguard example, I have something like this (hacked down version, sorry if something is missing)

# lib/wireguard.nix
{ config, pkgs, ... }:
let
  nodes = {
    alpha = {
      peerInfo = {
        allowedIPs = ["192.168.65.3/32"];
        publicKey = "....";
      };
      ips = ["192.168.65.3/24"];
    };
    beta = {
      peerInfo = {
        publicKey = "....";
        allowedIPs = ["192.168.65.0/24"];
      };
      ips = ["192.168.65.1/24"];
    };
  };
  self = nodes."${config.networking.hostName}";
  peers = pkgs.lib.filterAttrs (k: v: k != config.networking.hostName) nodes;
  extraHosts = lib.mapAttrsToList (k: v: "${lib.head (lib.splitString "/" (lib.head v.ips))} ${k}.vpn.local") nodes;

in {
 networking = {
    wireguard.interfaces.wg0 = {
      ips = self.ips;
      listenPort = 43642;
      privateKeyFile = "/secrets/wg.private";
      peers = pkgs.lib.mapAttrs (k: v: v.peerInfo) peers;
    };
    extraHosts = pkgs.lib.concatStringsSep "\n" extraHosts;
  };
}

Hosts that wish to be in the wireguard network have to

a) Add lib/wireguard.nix to their imports list in their configuration file
b) Add the relevant details to the top of the lib/wireguard.nix file.

I dislike this because of having to repeat myself too often. If I decide to change the hostname, I have to remember to change it here as well. It just feels brittle.

Is this similar to what you do?

I would have thought that in your method, you could do something recursive to reference all the other configurations. Maybe something like this for your network file in morph?

let
  pkgs = import <nixpkgs> {};
  lib = pkgs.lib;
  vpnHosts = lib.concatStringsSep "\n" (lib.mapAttrsToList (host: conf: "${conf.networking.wireguard.ips} ${host}") hosts);
  hosts = {
    server1 = {
      networking.wireguard.ips = "1.1.1.1";
      networking.extraHosts = vpnHosts;
    };
    server2 = {
      networking.wireguard.ips = "1.2.3.4";
      networking.extraHosts = vpnHosts;
    };
  };
in
  hosts

which I find MUCH more palatable. Especially once you abstract out the vpnHosts stuff into a separate file and even more so if you can make it a module such that enabling wireguard automatically adds vpnHosts.

I will definitely take a closer look at morph, and maybe even see if I can migrate to it. I’ll have to find the time first.

Thanks for sharing.

adamt · November 6, 2018, 1:09pm

No. I don’t think so at least.

We have a json-file* for each host, with host specific details like hostname, ip, …, basically everything to avoid duplicating things around. We then use them in multiple network expressions (e.g. the DHCP server reads all of the host.json-files for static reservations, while the Kubernetes modules selects a subset for kubelets and similar for masters, … etc).

I think you could do the same to alleviate some of your issues. But probably the thing that helps us most, is to be able to create multiple hosts (dynamically based on the static host files) in the same file, in the exact same manner as NixOps is doing it. Currently our network evaluator is just a stripped down version of the one from NixOps, with some extra features added on.

You’re a brave man :P. Feel very free to open issues on GitHub with your questions, and we’ll try to answer them there. I might even get my coworker (which is much more clever than I am) to help answering there as well.
It’s also a good time to ask for specific features, since we’ve begun planning of the next iteration of morph.

adamt · November 6, 2018, 1:11pm

This is the exact scenario we wanted to avoid.

This sound broken in so many ways, considering NixOS’s generation concept. I doubt it’s really necessary, but it wouldn’t be (easily) possible for us no matter what, since we deploy to physical hardware.

aanderse · November 6, 2018, 1:30pm

On our debian systems we always run snapshots before updates. That is what everyone is used to doing. With some more time and comfort we may reconsider this, though given how nixops itself has both a backup and restore option (for backends which support it) I think there are scenarios when a nixops rollback might not work as smoothly as a vm snapshot rollback.

colemickens · November 16, 2018, 12:06am

Can anyone speak to similarities and differences with krops (krops)? Thanks.

adamt · November 16, 2018, 10:00am

Thanks for the link. Krops seems to do something interesting with secrets that I’ll look more into.

From a cursory glance it seems that krops copies sources to the target, and builds and switches the target host from the target host itself, while morph does all building locally, pushes binaries to the target host, and only perform the switch on the target itself.
The main difference being that the krops method (unless i misunderstood it) requires internet access on the target host (or at least access to a local nix cache containing all the needed files).

There are of course other differences, but this might be a determining factor.