Nixops deployments makes machines unreachable

savannidgerinel · September 26, 2021, 2:54am

I’m experimenting with Nixops here on Digital Ocean, and I keep having troubles with my Nixops machines becoming unreachable.

To start with, I’m standing up the machine with Terraform, and I run a nixos-infect using Digital Ocean’s userdata. It takes a while, but it does work.

runcmd:
  - curl https://raw.githubusercontent.com/elitak/nixos-infect/master/nixos-infect | PROVIDER=digitalocean NIX_CHANNEL=nixos-21.05 bash 2>&1 | tee /tmp/infect.log

So, after this is done, I try to deploy to the machine with nixops. My deployment script looks like this:

{
  matrix =
    { config, pkgs, modulesPath, lib, ... }:
    {
      deployment.targetHost = "matrix.luminescent-dreams.com";

      services.openssh.enable = true;
      users.users.root.openssh.authorizedKeys.keys = [
        "ssh-rsa ..."
      ];

      imports = [ (modulesPath + "/profiles/qemu-guest.nix") ];
      boot.loader.grub.device = "/dev/vda";
      boot.initrd.kernelModules = [ "nvme" ];
      fileSystems."/" = { device = "/dev/vda1"; fsType = "ext4"; };
  };
}

So, I run this with nixops deploy, and it runs for a while, but then I see this block and the host becomes permanently unreachable.

matrix> stopping the following units: audit.service, kmod-static-nodes.service, mount-pstore.service, network-addresses-eth0.service, network-local-commands.service, network-setup.service, nix-daemon.service, nix-daemon.socket, nscd.service, resolvconf.service, systemd-journal-catalog-update.service, systemd-modules-load.service, systemd-sysctl.service, systemd-timesyncd.service, systemd-tmpfiles-clean.timer, systemd-tmpfiles-setup-dev.service, systemd-udev-trigger.service, systemd-udevd-control.socket, systemd-udevd-kernel.socket, systemd-udevd.service, systemd-update-done.service
matrix> error: Traceback (most recent call last):
  File "/nix/store/diswxhqf2hwylsxpvdz6wmr5z1qyj299-nixops-1.7/lib/python2.7/site-packages/nixops/deployment.py", line 743, in worker
    raise Exception("unable to activate new configuration (exit code {})".format(res))
Exception: unable to activate new configuration (exit code 255)

So far as I can tell, nixos-infect hard-codes the IP address of the machine in the networking code. Do I need to copy that into the configuration that I deploy every time? Or is there something else involved here that I’m missing?

nixinator · September 26, 2021, 7:54pm

what is a floating IP?

ilkecan · September 26, 2021, 8:30pm

Normally, when you create a server in the cloud a random IP address is assigned. So if you destroy and recreate the same infrastructure you would normally get different public IPs. But floating IPs allow you to reuse the same public IP. Basically you lease it.

Hetzner seems to charge extra for this but DigitalOcean only charges if you lease but don’t currently use it.
https://docs.hetzner.com/cloud/floating-ips/faq/

nixinator · September 26, 2021, 8:37pm

ah, a static ip address assignment.

savannidgerinel · September 26, 2021, 8:47pm

Also, I severely mistyped. When I was originally writing this, I had a floating IP. I removed it in order remove a possible complication, but I’m still getting the same behavior. I’ll correct what I wrote above to reflect it.

nixinator · September 27, 2021, 8:03am

not a solution, but i’ve never liked nixos-infect, it’s a hack. a beautiful beautiful hack , but a hack non the less.

Have you consider a using a full nixos vps provider such as vpsfree or other providers that provide out of the box bare mental nixos images.

however, it’s probably not the issues here (perhaps) but if you remove ‘hacks’ and only pure nix remains, you got better chance of getting to the root cause.

rather than give digital ocean your hard earned cash who don’t give a monkies about nix, , give your money to where the nix rebels lurk, so they can eat and do more commits. :-).

savannidgerinel · September 27, 2021, 12:45pm

I haven’t ruled it out, but it’s not really what I want to do right now. Since DigitalOcean does seem to allow custom images, that may just be the next step that I take.

nixinator · September 27, 2021, 1:37pm

email sent to digital ocean.

Dear Digital Ocean…

Please supply nixos images for you VM’s

Cheers,
Arch Linux User.

savannidgerinel · November 10, 2021, 1:52pm

I don’t remember who now, but somebody pointed me to a DigitalOcean image builder already present in nixpkgs. And pointed me to where to find the UI for uploading an image. So, I made an image with

{ pkgs ? import <nixpkgs> {} }:
let config = {
  imports = [ <nixpkgs/nixos/modules/virtualisation/digital-ocean-image.nix> ];
};
in (pkgs.nixos config).digitalOceanImage

And that is so far pretty successful, but only if I use morph instead of nixops. And then my machines seem to go into emergency mode when I try to attach a volume.

So, some success, but I’m certainly not where I want to be yet.