Nixops deployments makes machines unreachable

I’m experimenting with Nixops here on Digital Ocean, and I keep having troubles with my Nixops machines becoming unreachable.

To start with, I’m standing up the machine with Terraform, and I run a nixos-infect using Digital Ocean’s userdata. It takes a while, but it does work.

runcmd:
  - curl https://raw.githubusercontent.com/elitak/nixos-infect/master/nixos-infect | PROVIDER=digitalocean NIX_CHANNEL=nixos-21.05 bash 2>&1 | tee /tmp/infect.log

So, after this is done, I try to deploy to the machine with nixops. My deployment script looks like this:

{
  matrix =
    { config, pkgs, modulesPath, lib, ... }:
    {
      deployment.targetHost = "matrix.luminescent-dreams.com";

      services.openssh.enable = true;
      users.users.root.openssh.authorizedKeys.keys = [
        "ssh-rsa ..."
      ];

      imports = [ (modulesPath + "/profiles/qemu-guest.nix") ];
      boot.loader.grub.device = "/dev/vda";
      boot.initrd.kernelModules = [ "nvme" ];
      fileSystems."/" = { device = "/dev/vda1"; fsType = "ext4"; };
  };
}

So, I run this with nixops deploy, and it runs for a while, but then I see this block and the host becomes permanently unreachable.

matrix> stopping the following units: audit.service, kmod-static-nodes.service, mount-pstore.service, network-addresses-eth0.service, network-local-commands.service, network-setup.service, nix-daemon.service, nix-daemon.socket, nscd.service, resolvconf.service, systemd-journal-catalog-update.service, systemd-modules-load.service, systemd-sysctl.service, systemd-timesyncd.service, systemd-tmpfiles-clean.timer, systemd-tmpfiles-setup-dev.service, systemd-udev-trigger.service, systemd-udevd-control.socket, systemd-udevd-kernel.socket, systemd-udevd.service, systemd-update-done.service
matrix> error: Traceback (most recent call last):
  File "/nix/store/diswxhqf2hwylsxpvdz6wmr5z1qyj299-nixops-1.7/lib/python2.7/site-packages/nixops/deployment.py", line 743, in worker
    raise Exception("unable to activate new configuration (exit code {})".format(res))
Exception: unable to activate new configuration (exit code 255)

So far as I can tell, nixos-infect hard-codes the IP address of the machine in the networking code. Do I need to copy that into the configuration that I deploy every time? Or is there something else involved here that I’m missing?

1 Like

what is a floating IP?

Normally, when you create a server in the cloud a random IP address is assigned. So if you destroy and recreate the same infrastructure you would normally get different public IPs. But floating IPs allow you to reuse the same public IP. Basically you lease it.

Hetzner seems to charge extra for this but DigitalOcean only charges if you lease but don’t currently use it.
https://docs.hetzner.com/cloud/floating-ips/faq/

ah, a static ip address assignment.

Also, I severely mistyped. When I was originally writing this, I had a floating IP. I removed it in order remove a possible complication, but I’m still getting the same behavior. I’ll correct what I wrote above to reflect it.

not a solution, but i’ve never liked nixos-infect, it’s a hack. a beautiful beautiful hack , but a hack non the less.

Have you consider a using a full nixos vps provider such as vpsfree or other providers that provide out of the box bare mental nixos images.

however, it’s probably not the issues here (perhaps) but if you remove ‘hacks’ and only pure nix remains, you got better chance of getting to the root cause.

rather than give digital ocean your hard earned cash who don’t give a monkies about nix, , give your money to where the nix rebels lurk, so they can eat and do more commits. :-).

I haven’t ruled it out, but it’s not really what I want to do right now. Since DigitalOcean does seem to allow custom images, that may just be the next step that I take.

email sent to digital ocean.

Dear Digital Ocean…

Please supply nixos images for you VM’s

Cheers,
Arch Linux User.