Hello all!
I thought I would write a quick update, for anyone who is watching this thread or anyone else who might stumble across it, explaining what I have discovered over the past few weeks and where I am thinking of going with this effort. I will split this post into 4 sections:
- Deployment methods
- Networking
- Disk configuration
- Next steps
It is my intention to write this in a wiki/blog style, without worrying about polish, just to start adding some structure to what I have discovered.
Deployment Methods
I have been experimenting with various methods of deployment. Below are three methods that I have tried, with (1) and (3) seeming the most promising and practical to me.
-
As mentioned prior, I have made a flake which makes use of the qemu package which is allows you to package a nixosConfigurations output as a virtual machine that you can build and run using nix build. The key advantage of this approach is that it requires no installation other than nix and flakes, meaning you can easily iterate on a NixOS configuration without sacrificing any existing machines or polluting your system environment. I used this to successfully develop a new NixOS config on an Arch Linux system which made the process of finally switching back to NixOS painless and stress free.
This approach is great for getting a functional VM up and running quickly, and you can even add QEMU options when you run the executable, by adding QEMU_OPTS as an environment variable. I find it particularly useful for specifying options such as -nographic when I want to connect tooling to the VM over SSH rather than running a GUI within the VM.
However I have found it imposes some frustrating limitations when you try to do anything more advanced. For example, it specifies a virtualisation.qemu.options list, where you can specify native QEMU options as strings, but for networking you need to do these in virtualisation.qemu.networkingOptions for some reason, and use lib.mkForce to override the default userspace networking the VM tries to force. You can also choose to bypass the QEMU options and use the virtualisation option set, however this further obfuscates things in my opinion and given that the only real documentation at current is on the QEMU manual and wiki pages, this only causes frustration when trying to troubleshoot.
When I’m running this, I normally run a one liner (insert env variables after &&)
nix build && ./result/bin/run-<hostname>-vm
-
Another method I have tried is using the nixos-rebuild command with the build-vm and build-image commands to try and create VMs. These also seem to work okay, but it requires a nixos installation to run (unless anyone knows of a package similar to nixos-install-tools which would add the command to a nix shell). The use case I see for this is to test your current host NixOS configuration in a more ad hoc manner.
-
I have also been trying to work with QEMU directly. This offers the best control over the VM, as option (1) does impose some design constraints that are not idiomatic for QEMU. The downside of this approach is that it takes a lot more work to get a working installation, with the standard approach being to attach a -cdrom installer ISO for the first boot and perform the installation like you would a normal NixOS system. This forgoes a lot of the benefits of option (1) as you now have to install the QEMU package on your system - although you no longer need to have Nix installed so I guess that balances out - and cannot just build the VM.
To those who enjoy a bit more tinkering however, it is not impossible to recreate the same kind of function which mounts the nix store closure on the host into the VM. Again, the function in nixpkgs provides a good place to start for some reverse engineering, and you can also cat the result from option (1) to get a view of the final QEMU command that it generates at the end of the file. This feels like an approach which would be well suited to more advanced users.
Networking
I have been playing around with both userspace and tap networking, although my primary focus has been on tap networking. QEMU already provides amazing documentation on networking which has given me more or less an understanding of how this works, and I’ve only had to consult the LLMs a few times to help troubleshoot some minor interface permissions problems with tap devices. If you don’t understand the difference between userspace and tap networking then I would highly recommend reading that page.
Tap
So far the best options I have found for tap devices are:
"-netdev tap,id=net0,ifname=tap0,script=no,downscript=no"
"-device virtio-net-pci,netdev=net0" # specify the device type
Most of this is pretty self-explanatory from the docs, but I have included script=no,downscript=no to prevent QEMU from trying to control the tap0 device, which I create manually on the host, and specified the -device as virtio-net-pci as this is a higher performance network interface card emulation than the QEMU default.
Userspace Port Forwarding
In userspace networking, I have found that the easiest way to expose ports for things like SSH access or webpages is to use the virtualisation option set. Note that these need to be wrapped in a virtualisation.vmVariant so that NixOS will enable these options when you are building VMs. It seems like a weird design choice to me but it is what it is I guess.
virtualisation.vmVariant = {
virtualisation.forwardPorts = [
{ from = "host"; host.port = 2222; guest.port = 2222; } # ssh example
];
};
The primary use case I see for this is if you want a quick isolated VM for lab work, whether that be on a server or locally. I imagine this is most useful on a machine where you have a browser available to look at dashboards etc. and I have used it for this myself quite happily. It’s very nice being able to rapidly try out new platforms and not worry at all about polluting my main system, no matter what distro I am running on the hypervisor.
Disk Configuration
This one has been both the simplest and the hardest for me to wrap my head around. The good news is that the QEMU docs are once again very good for instructing you on mounting a host directory into a VM, which is very good for development work as you can edit the code on your machine within tooling installed on the VM.
For a production system however, it makes more sense to me to work directly with disk volumes directly on the host. This means that a volume manager becomes very enticing, even for a local workstation. I have been experimenting with ZFS ZVOLs in this regard.
Host FS Mounts (9P)
I have worked with the 9p protocol for mounting host filesystems into the VM. This has resulted in a very satisfying development experience when working with a VM virtualising my entire docker stack. However, performance has been the biggest problem with this approach and I believe there are better protocols out there.
I think the most important thing to consider is how you specify the directory on the host system that you want to mount into the VM. The cleanest way to do this in my opinion is to specify the host directory as an environment variable when you run the script rather than hard coding it into the build. This just gives you more flexibility and allows you to cleanly separate the data from the codebase.
This requires 3 things to be configured:
- QEMU options
- Filesystem mount
- 9P kernel parameters
QEMU options:
"-virtfs local,mount_tag=share-mount,path=$SHARE_DIR,security_model=mapped-xattr"
path=$SHARE_DIR indicates that the path to mount from the host is specified by the $SHARE_DIR environment variable at runtime. You would add this to your command as such:
nix build && SHARE_DIR=/path/to/host/dir ./result/bin/run-<hostname>-vm
Filesystem mount:
fileSystems."/mnt" = {
fsType = "9p";
device = "share-mount";
options = [ "trans=virtio" "version=9p2000.L" ];
};
Note that device = “share-mount” matches mount_tag=share-mount in the QEMU options
9P kernel parameters:
boot.kernelParams = [
"CONFIG_NET_9P=y"
"CONFIG_NET_9P_VIRTIO=y"
"CONFIG_9P_FS=y"
"CONFIG_9P_FS_POSIX_ACL=y"
"CONFIG_PCI=y"
"CONFIG_VIRTIO_PCI=y"
];
ZVOLs
By default, the VM will attempt to run from a QCOW2 file. This introduces a lot of rw overhead for a system (especially in production), so for more performance it is beneficial to make it use a block device directly. This can be accomplished for a data mount within the VM, or by running the whole VM from a partition. The latter has been what I have been working on, and has been the hardest part of mapping the disk side of things.
This is where I’ve run into the most problems with option (1) in the deployment methods. The default NixOS script absolutely tries its hardest to make life miserable for you, as it will autogenerate a QCOW2 file by default if one doesn’t exist already. As discussed before, this is great if you want to get up and running quickly, but is a nightmare for more advanced usage.
Installing a VM on a ZVOL
There is an option to disable creating the qcow2 file, by setting virtualisation.diskImage = null, however this forces the VM to use an ephemeral filesystem and has thwarted all of my efforts to make it use a ZVOL instead. I have found that the best way to do this is by copying the contents of the QCOW2 file onto the ZVOl, and then replacing the QCOW2 file with a symlink to the zvol. This throws a warning about writing directly to a raw disk every time I start the VM, but it seems to work.
To do this, you need to run the VM first
nix build # you need to build the VM first for the result to appear
./result/bin/run-<hostname>-vm
# alternatively use nix build && ./result/bin/run-<hostname>-vm if you know the hostname
This will create a QCOW2 file. Next:
qemu-img convert -f qcow2 -O raw <hostname>.qcow2 raw.disk # extract the filesystem to raw format
dd if=raw.disk of=/dev/zvol/<pool>/<name> # copy the raw image to the zvol
Then, you just need to clean up and symlink the device
rm <hostname>.qcow2
rm raw.disk
ln -s /dev/zvol/<pool>/<name> <hostname>.qcow2
Mounting a ZVOL into a VM
I have also had success mounting a ZVOL into the VM. This requires extra QEMU options to expose the device
"-drive file=/dev/zvol/<pool>/<name>,format=raw,if=none,id=zfs0,cache=writeback"
"-device virtio-blk-pci,drive=zfs0" # add the device inside the VM
You will need to format the zvol - I believe this also may work for a dataset - but once you have you can grab its UUID from the host using lsblk -f and use that to mount it within the VM
# mount the zvol into the VM
fileSystems."/srv" = {
device = "/dev/disk/by-uuid/c761c692-7dbc-47a8-98ed-cd45a2d79c37";
fsType = "ext4";
};
This may be useful if you only care about performance/storage of the data in your VM, or if you want to have separate data volumes from your VM’s system volume.
Next steps
As far as I see it, there are two challenges to continue this project:
- I have only mapped out a fraction of what can be done. I am not confident to call this “complete” at this stage and there is definitely a lot more to consider for production hardening.
- Writing a wiki page which restructures these findings into a way that is clear and accessible to a new audience, while containing enough detail and further reading references for advanced users. Ideally, this should flow well for any reader and reduce barriers to entry - benefiting all users.
Most of what I have learned so far has been put together by reading various QEMU docs, inspecting the inner workings of the result of the flake, and asking LLMs about the nixpkgs function. If you want to learn for yourself or contribute, this is where I would recommend beginning.
I’m going to continue plodding along with this in any case because I find this quite interesting. At the very least I find this process useful for documenting what I am discovering; I hope it is useful to others and might spark some discussion from other enthusiastic QEMU users 