I’m self-hosting a NixOS server with several services. It’s a simple setup: a Hetzner VPS, configured with a Flake.
Generally it works very well, but recently it happened to me twice that after an upgrade something was broken. My process is to manually test everything important and rollback if necessary. But this leads to downtime between upgrade and rollback.
What strategies are people using to make sure upgrades (or other changes to configuration) work before they apply them to production systems?
I was thinking it would be nice to run my system in a VM or some kind of container and test it there. I know there is the nixos-build-vms
command. Before hacking some solution around it, I’d like to get advice. Considerations:
- CPU architecture (the server is aarch64, my laptop x86_64 and some problems may be architecture specific),
- stateful data (like databases),
- networking (e.g. obtaining Let’s Encrypt certificates require public access to the server; there’s also a Wireguard VPN to my home server and laptop).