NixOS as homelab hypervisor

:wave:t2: Hey fellow nixers!

I am seeking some wisdom for using NixOS as a homelab hypervisor and would appreciate any thoughts and ideas to help me figure out the right path forward.

I have two server systems that I use for my homelab, one is more powerful than the other. Both of them run Proxmox at the moment and they are clustered, with the bigger emphasis on the chunkier one when scheduling the VMs. I am running two VM instances of the OPNsense (one on each server) in an HA setup with automatic failover and all this sort of fun. I have quite a few VMs (some of which are NixOS based) and LXC containers (some of which are also NixOS based).

As much as I appreciate Debian, there is no denying that NixOS, when used well, is far superior with its declarative management and ability to quickly and safely roll-back. So I have been considering switching over from Proxmox to NixOS as a hypervisor (instead of upgrading to the latest major release of Proxmox), but I am struggling to find quite the right set of tools to get me what I actually need. So hopefully someone can point me in the right direction.

Essentially, what I would like to have is an ability to run an arbitrary number of virtual machines of different types across two nodes. Ideally it would be possible to migrate VMs across the servers if I need to take one of them down for maintenance.

I would appreciate a degree of overview, monitoring and reporting for the running VMs - CPU and memory usage, I/O delays, etc (I can build some of this myself on top of Prometheus or something like that, but if there is something that I can use out of the box that would be preferred). I would like to be able to run VMs with different guest OSes (Windows, FreeBSD, Linux). I require PCIe passthrough to attach ethernet cards to OPNSense VMs.

There will be two different categories of things that I plan to run in my homelab:

  • “production” services, that are things like fileshare, proxy service, DNS service, etc;
  • “development” service that will be all over the place, e.g. today I want to run and explore Nomad orchestrator, tomorrow it will be Kubernetes, in a week I want microvms and Firecracker. The point here is to experiment and be able to set-up and tear down the infrastructure declaratively without affecting my “production” workloads;

I have been looking at OpenNebula, OpenStack or Mist.io, but none of them actually integrate and work well with NixOS as far as I understand. On the Nix side, I have been looking at microvm.nix, but so far it only supports NixOS/Linux and not Unix or Windows and also doesn’t solve the observability and monitoring part.

I am sure many of you have some fancy cool homelab set-ups and can suggest some ideas on how I can achieve the above. Many thanks in advance!

2 Likes

I don’t know if this will be of help to you, but I have just been leveraging libvirtd and the number of nix options.

But my use case is kept quite simple on purpose.

There are so many client apps, that can fit your needs (ie virt-manager, virter, kcli, etc). But I suspect you might be looking for more.

Thank you. Yes libvirtd is one of the things that I am considering, but I would like to have a bit more functionality out of the box. I can use Terraform to have some declarative management too, which is nice, but I wanted to see if something more advanced existed.

The other option I have considered is Cloud Hypervisor, which is also built on top of the KVM and provides an API to manage it out of the box, but it’s the same as libvirtd, i.e. just a foundation with a lot of stuff that I would have to solve somehow later.

NixOS has litter practices in this area, so far for now.

Maybe you can give kubevirt a try, it’s based on Kubernetes.

I’ve recently switched my homelab to microvm.nix and it works brilliantly as a VM management abstraction - once you understand how it works and have deployed a few VMs with it.

It doesn’t really have to bother with this - every microvm exists as a systemd unit so you can use the GitHub - prometheus-community/systemd_exporter: Exporter for systemd unit metrics to extract metrics from them.

2 Likes

Thanks. I actually looked at microvm and it looks fine for simple cases, but the moment you want to do something more - it gets in a way (at least for me).

Metrics-wise, I don’t think systemd exporter will be even close enough to what I would like to have. Things like I/O stats, disk usages and memory+ballooning obtained from the hypervisor itself, because systemd has only so much visibility into what’s up with any particular unit.

I think my wishful thinking is something like Proxmox or Unraid experience which can be controlled declaratively through Nix (or something similar)…

Minus ballooning, systemd supports all of this as of its latest release:

● microvm@caddy.service - MicroVM 'caddy'
     Loaded: loaded (/etc/systemd/system/microvm@.service; static)
    Drop-In: /nix/store/d13kv2s912iy9j1m74xjrsp47fpsgf8l-system-units/microvm@caddy.service.d
             └─overrides.conf
     Active: active (running) since Thu 2023-10-26 02:26:25 UTC; 7h ago
   Main PID: 1317 (cloud-hyperviso)
         IP: 0B in, 0B out
         IO: 9.9M read, 0B written
      Tasks: 14 (limit: 38350)
     Memory: 283.3M
        CPU: 3min 11.686s
     CGroup: /system.slice/system-microvm.slice/microvm@caddy.service
             └─1317 /nix/store/vl6m6cjhyzzlsz7jc60faxkjpyr8697s-cloud-hypervisor-32.1/bin/cloud-hypervisor

Seems there’s a bug with the IP accounting as it uses a briddge instead of the normal Ethernet (still obtainable via node-exporter).

…and the exporter is just as easy as:

  services.prometheus.exporters.systemd = {
    enable = true;
    extraFlags = [
      "--systemd.collector.enable-ip-accounting"
      "--systemd.collector.enable-restart-count"
    ];
  };

Does the abstraction get in the way at the beginning? Sure - but I think I’m currently living your dream.

2 Likes

Thanks again. I certainly will give microvm another proper go and might actually hit you up with some of the things, if you don’t mind. Mostly around things of taking snapshots+backing up and VM migrations across hosts.

1 Like