For those who want to hear bad story about NixOps self-deploy.
I’ve had deployed a Hetzner machine from EC2 machine, then transfered expressions and state to that Hetzner machine, and continued to deploy things from there. So, effectively, there was NixOps self-deploy.
Then, one day, a newcomer did
nixops deploy from dev machine. HOLY SH…! You may guess what happend. It launched
deployment.hetzner.partitions script (no harm @aszlig, I think it is a nice tool) which wiped drives (which had raid for boot and lvm for rest). The problem was that deploy was done from local machine, which didn’t have statefile (stored on Hetzner machine). Fresh NixOS as a result, and wiped drives!
Here is the script:
deployment.hetzner.partitions = ''
clearpart --all --initlabel --drives=sda,sdb
part raid.1 --size 256 --ondisk=sda
part raid.2 --size 256 --ondisk=sdb
raid /boot --level=1 --device=md0 --fstype=ext3 --label=boot raid.1 raid.2
part swap --size 4000 --label=swap --ondisk=sda
part / --size 30000 --fstype=ext4 --label=root --ondisk=sda
We had NixOps expressions in VCS, but thing we learned - we should put state files into VCS too! So this is first rule if you ever are going to self-deploy NixOps with non-
ssh backend. For SSH backends it shouldn’t be that big problem to reconstruct statefile. But for Hetzner one it was a problem.
There was happy-end eventually. Looks like
nixops deploy did wipe the drive in same way it did this during initial drive formatting. So only FS metadata was wiped. LVM stored all it’s extents information in some 2MB-apart block, and we’ve used this to
vgcfgrestore LVM. Then some statefile db mangle to update Hetzner account names. So lesson two - backup periodically
vgcfgbackup of your LVM!