Remote Building Experienence massively degraded since ~2-3 months

My remote building experience has massively degraded since around October-November 2021, so much that the amount of pull requests I review has shrunk by a large margin.

And I always wonder if I’m the only one experiencing these issues, because I don’t see many other people complaining about this situation. And subsequently I don’t see the much attention given to the following issues.

I’ve tried multiple combinations of nix versions and build machines for a while now and all of them yield frustrating results.

Anything but trivial builds basically needs a helper function like this, so builds are retried until they work.

succeed () {
	while true
		$@ && break

The situation has become unworkable for me. Last night I queued builds for my server closures only to find out it was stuck for over 8 hours doing nothing, having finished no server closure at all. :cry:

Building things locally works just fine. On all of these machines.

I have so much capacity to build things, that I can’t put to good use, it’s so very frustrating. This is very much a high-level rant, because I’m trying to gauge how many others are affected by these kinds of problems.

My remote builder config looks like this:

  nix.buildMachines = [ {
    hostName = "remoteserver";
    sshUser = "ssh://hexa";
    sshKey = "/home/hexa/.ssh/id_remotebuild";
    systems = [
    maxJobs = 32;
    speedFactor = 4;
    supportedFeatures = [ "big-parallel" "kvm" "nixos-test" "benchmark" ];
    mandatoryFeatures = [ ];
  } {
    hostName = "homeserver";
    sshUser = "ssh://root";
    systems = [
    maxJobs = 4;
    speedFactor = 4;
    supportedFeatures = [ "kvm" "nixos-test" ];
    mandatoryFeatures = [ ];
  } {
    hostName = "aarch64-builder";
    sshUser = "ssh://root";
    sshKey = "/home/hexa/.ssh/id_ed25519";
    system = "aarch64-linux";
    maxJobs = 1;
    speedFactor = 3;
    supportedFeatures = [ "big-parallel" ];
    mandatoryFeatures = [ ];
  } ];

  nix.distributedBuilds = true;
  nix.extraOptions = ''
    builders-use-substitutes = true

My local nix.conf looks like this, the remote builders don’t have any special config.

# WARNING: this file is generated from the nix.* options in
# your NixOS configuration, typically
# /etc/nixos/configuration.nix.  Do not edit it!
build-users-group = nixbld
max-jobs = 4
cores = 0
sandbox = true
extra-sandbox-paths = 
substituters =
trusted-substituters = 
trusted-public-keys =
auto-optimise-store = true
require-sigs = true
trusted-users = root hexa
allowed-users = *

system-features = nixos-test benchmark big-parallel kvm
sandbox-fallback = false

keep-outputs = true
keep-derivations = true

builders-use-substitutes = true

One issue I discovered yesterday is that I had stale ControlMaster sockets in /root/.ssh/ and nix was showing connecting to ssh://builder in verbose log, but nothing else, and it also never timed out :man_facepalming:

I’m on nixUnstable right now on all machines and remote builds are working for me again.


Today I saw many stuck processes (like ssh connections to remote builders) under systemctl status nix-daemon.service that even persisted through restarts of that unit. That meant I was running out of build slots :confused:

Waiting for locks or build slots…

There are apparently many little paper cuts that make nix usage unnecessarily fragile these days.


I’ve been getting this issue and I don’t even see the waiting for locks message without --verbose so I was just sat wondering why the cache was frozen

1 Like

Another dumb issue I ran into was setting services.navidrome.settings.MediaPath = /tank/music. This made nix copy my music database into the nix store with no output whatsoever to guess what it was doing.

Finally strace -e open,openat on the busy nix-daemon process did the trick.

I consider that part of this issue, because it is one of the reasons nix may remain silent.