NixOps deploy to self?

generic-specialty · January 6, 2019, 8:33pm

While reading an excellent guide by Will Fancer about NixOps secret management, I came across the following statement (emphasis mine):

NixOps commands will make sure to redeploy keys when you run nixops reboot, but you can’t use this command for to reboot a machine that deploys to itself…

Interesting, using NixOps to deploy to the very machine the deployment is running on.
Would this work? Are there any benefits to doing so?

palo · January 6, 2019, 10:06pm

should work when you use localhost as host. About the benefit, I don’t know.

erictapen · January 6, 2019, 11:16pm

I’m doing this for quite a while now. Benefit for me is that I have one
single solution for every machine I’m administrating.

generic-specialty · January 7, 2019, 12:03am

Ooh, that’s interesting. Plus, it seems like it would make it easier to move machines once you no longer have a system of permanent “masters” and “slaves.”

How do you organize your dotfiles? Would you min sharing them, assuming they are clean of personal information?

ryantm · January 7, 2019, 2:50am

I use NixOps to manage the computer that I use for NixOps operations at work. You don’t have to use /run/keys/ for the deployment.keys path, they can be deployed to any directory. The deployment.keys mechanism is convergent system configuration instead of declarative configuration: if you remove a key, it doesn’t get removed from the system. This is mitigated by the default being a temp directory, but I don’t find the temp directory practical because I want it to be safe for my machines to reboot (outside of nixops reboot) or lose power.

erictapen · January 7, 2019, 10:57am

I can’t share my whole config, but these should be the bits to make deploy to self possible:

deploy.nix:

{
  maschine = { config, pkgs, ... }:
  {
    deployment = {
      targetEnv = "none";
      targetHost = "127.0.0.1";
      targetPort = 22;
    };

    imports = [ ./laptop.nix ];
  };
}

laptop.nix:

{ config, pkgs, ... }:{
  services.openssh = {
    enable = true;
    permitRootLogin = "without-password";
    passwordAuthentication = false;
    listenAddresses = [ { addr = "127.0.0.1"; port = 22; } ];
  };

  users.users = {
    root = {
      openssh.authorizedKeys.keys = [
        "ssh key for my user";
      ];
    };
  };

  ...
}

My dotfiles reside in /home and are not managed by Nix at all. Just plain files that are backupped.

aanderse · January 7, 2019, 11:09am

I thought as soon as you use deploymeny.keys nixops creates a service that prevents a full unexpected reboot requiring the admin to call the send-keys method?

danbst · January 7, 2019, 12:35pm

For those who want to hear bad story about NixOps self-deploy.

I’ve had deployed a Hetzner machine from EC2 machine, then transfered expressions and state to that Hetzner machine, and continued to deploy things from there. So, effectively, there was NixOps self-deploy.

Then, one day, a newcomer did nixops create/nixops deploy from dev machine. HOLY SH…! You may guess what happend. It launched deployment.hetzner.partitions script (no harm @aszlig, I think it is a nice tool) which wiped drives (which had raid for boot and lvm for rest). The problem was that deploy was done from local machine, which didn’t have statefile (stored on Hetzner machine). Fresh NixOS as a result, and wiped drives!

Here is the script:

    deployment.hetzner.partitions = ''
      %pre
      vgdisplay -C
      %end
      clearpart --all --initlabel --drives=sda,sdb

      part raid.1 --size 256 --ondisk=sda
      part raid.2 --size 256 --ondisk=sdb
      raid /boot --level=1 --device=md0 --fstype=ext3 --label=boot raid.1 raid.2

      part swap --size 4000 --label=swap --ondisk=sda
      part / --size 30000 --fstype=ext4 --label=root --ondisk=sda
    '';

We had NixOps expressions in VCS, but thing we learned - we should put state files into VCS too! So this is first rule if you ever are going to self-deploy NixOps with non-ssh backend. For SSH backends it shouldn’t be that big problem to reconstruct statefile. But for Hetzner one it was a problem.

There was happy-end eventually. Looks like nixops deploy did wipe the drive in same way it did this during initial drive formatting. So only FS metadata was wiped. LVM stored all it’s extents information in some 2MB-apart block, and we’ve used this to vgcfgrestore LVM. Then some statefile db mangle to update Hetzner account names. So lesson two - backup periodically vgcfgbackup of your LVM!

ryantm · January 7, 2019, 4:44pm

Such a service wouldn’t protect against power loss, or IPMI power cycling.

Here’s the code: https://github.com/NixOS/nixops/blob/3d5e816e622b7863daa76732902fd20dba72a0b8/nix/keys.nix

I don’t see something like that in there. You can make a service wait on keys.target to prevent it from coming up until the keys are present.

aszlig · January 17, 2019, 6:12pm

So the source contained hetzner.robotUser and hetzner.robotPass? If yes, I’d recommend to use HETZNER_ROBOT_USER and HETZNER_ROBOT_PASS instead and pass the environment variables only during initial deployment, because that will set up a dedicated sub-robot-user for every single machine and is only needed for initial deployment and destroy. If these environment variables are missing, deploy will fail on a new statefile.

danbst · January 17, 2019, 8:16pm

No, we don’t have that in code. Apparently, newcomer did provide those envvars, because “deployment didn’t work”, so he read some docs and … Ultimate problem was that nixops statefile wasn’t replicated with infra code (or rather, absence of rule “never deploy from dev machine!”). Some bit in that statefile encodes “disks are already formatted”.