I’m experimenting with Nixops here on Digital Ocean, and I keep having troubles with my Nixops machines becoming unreachable.
To start with, I’m standing up the machine with Terraform, and I run a nixos-infect using Digital Ocean’s userdata. It takes a while, but it does work.
runcmd:
- curl https://raw.githubusercontent.com/elitak/nixos-infect/master/nixos-infect | PROVIDER=digitalocean NIX_CHANNEL=nixos-21.05 bash 2>&1 | tee /tmp/infect.log
So, after this is done, I try to deploy to the machine with nixops. My deployment script looks like this:
{
matrix =
{ config, pkgs, modulesPath, lib, ... }:
{
deployment.targetHost = "matrix.luminescent-dreams.com";
services.openssh.enable = true;
users.users.root.openssh.authorizedKeys.keys = [
"ssh-rsa ..."
];
imports = [ (modulesPath + "/profiles/qemu-guest.nix") ];
boot.loader.grub.device = "/dev/vda";
boot.initrd.kernelModules = [ "nvme" ];
fileSystems."/" = { device = "/dev/vda1"; fsType = "ext4"; };
};
}
So, I run this with nixops deploy, and it runs for a while, but then I see this block and the host becomes permanently unreachable.
matrix> stopping the following units: audit.service, kmod-static-nodes.service, mount-pstore.service, network-addresses-eth0.service, network-local-commands.service, network-setup.service, nix-daemon.service, nix-daemon.socket, nscd.service, resolvconf.service, systemd-journal-catalog-update.service, systemd-modules-load.service, systemd-sysctl.service, systemd-timesyncd.service, systemd-tmpfiles-clean.timer, systemd-tmpfiles-setup-dev.service, systemd-udev-trigger.service, systemd-udevd-control.socket, systemd-udevd-kernel.socket, systemd-udevd.service, systemd-update-done.service
matrix> error: Traceback (most recent call last):
File "/nix/store/diswxhqf2hwylsxpvdz6wmr5z1qyj299-nixops-1.7/lib/python2.7/site-packages/nixops/deployment.py", line 743, in worker
raise Exception("unable to activate new configuration (exit code {})".format(res))
Exception: unable to activate new configuration (exit code 255)
So far as I can tell, nixos-infect
hard-codes the IP address of the machine in the networking code. Do I need to copy that into the configuration that I deploy every time? Or is there something else involved here that I’m missing?