`runNixOSTest` is faster when interactive?

I am writing a module for a ruby on rails app which requires various supporting services (postgres, redis, postfix…).

So I wrote a little integration test which seems to work ok, except that it’s about 5-10x slower when running “unattended” vs running it with .driverInteractive.

The build seems to be roughly equally fast, it’s the VM execution itself which differs.

It does not seem to be a networking issue, because when comparing the timestamps on the kernel logs during boot, everything seems to be slower, not just network related lines. For instance, the database migrations run in 2.5s vs 37s. Even before systemd kicks in, I can see a massive difference.

In interactive mode, the VM starts my service in ~40s, in non-interactive, it takes 4-6 minutes to reach the same stage:

testserver # [   41.566722] systemd[1]: Startup finished in 1.653s (kernel) + 39.910s (userspace) = 41.563s.

vs

testserver # [  355.145035] systemd[1]: Startup finished in 15.221s (kernel) + 5min 39.905s (userspace) = 5min 55.126s.

I don’t think my actual config is relevant since the VM booting is exhibiting the problem, but some pointers:

  • I am using flakes
  • lix 2.93.3
  • I use nixpkgs-unstable
  • I run the following commands (on the same machine, I’ve repeated execution several times alternating between the two modes and consistently found the same results):
nix -L build  --extra-experimental-features flake-self-attrs .#serverTests.driverInteractive
./result/bin/nixos-test-driver --interactive

and on the other side

nix -L build  --extra-experimental-features flake-self-attrs .#serverTests

and the output is defined as:

serverTests = pkgs.testers.runNixOSTest ./nix/alaveteli-server-test.nix;

where the test file defines a single node with my service. There is no “low-level” config that I’m aware of in my code.

In the end, my non interactive tests fail because of timeouts. In interactive mode, start_all() and test_script() actually succeed.

Is this difference in VM speed expected? How would I go about debugging this problem?

1 Like

I assume you’re not using a remote builder, otherwise it’s easy to explain the difference in runtime.

Theory: when running interactively the VM disk image is stored (it’s in your cwd) on a faster disk than where the sandboxed Nix build is (should be /tmp).

The difference is that the interactive runs outside of the Nix sandbox, the non interactive runs inside the Nix sandbox.

It is probably slowed down by the various sandbox things.

You can probably get some help diagnosing this delay by using systemd-analyze time or systemd-analyze critical-chain, but I recommend trying running the VM tests on some beefy machine and ensuring that the build directory sits on RAM or NVMe (--build-dir OTOH).

Also double check that you are not running your VM tests in some sort of nested virtualization or QEMU TCG emulation and that you are indeed using the KVM acceleration, this can explains also such large differences.

It’s not anymore.

2 Likes

Sorry I took a while to get back to this. I tried to investigate and learn/understand (I did! thanks for the pointers).

So I forgot one key element in my problem description:
I am running debian.

and on debian:

ls -l /dev/kvm
crw-rw----+ root kvm 0 B Sun Sep 14 01:34:56 2025 /dev/kvm

In other words, nixbld cannot access it from inside the sandbox, which is reflected in the warning I initially missed (this is emitted when the VM starts):

qemu-system-x86_64: Could not access KVM kernel module: Permission denied
qemu-system-x86_64: failed to initialize kvm: Permission denied
qemu-system-x86_64: falling back to tcg

so a sudo chmod 666 /dev/kvm fixes that, and things run 10-12x faster. It doesn’t feel like a very clean fix though

In the past, warnings were emitted about this and then removed, but I’m not entirely sure why.

I feel like some warning could be emitted by [ln]ix to warn about this (or should this be done in the nixpkgs qemu package as above?), say if /dev/kvm is not world read/writable. While not critical, it would be one of those little life improvements that saves people like me a few hours of head banging :face_with_head_bandage:

While digging, I found a little bug in the nixos test runner, hopefully my fix makes sense, although it wasn’t the real perf problem :slight_smile: : nixosTestRunner: cache store path length to avoid useless calls by laurentS · Pull Request #442793 · NixOS/nixpkgs · GitHub

2 Likes