Hydra fails to start local builds

Some time ago my Hydra instance started failing every local build. I don’t see any changed in my Hydra instance configuration preceding date of the first failure, experimented a bit with configuration but still no luck.

My hydra is running at x86_64-linux machine, has one aarch64-darwin and one aarch64-linux remote builders which successfully run queued jobs but every local x86_64-linux builds are failing with the error:

[31;1merror:e[0m failed to start SSH connection to 'e[35;1mssh://localhoste[0m'

journalctl:

Jul 19 19:53:17 hydra hydra-queue-runner[2569086]: performing step ‘/nix/store/zbcxg8xhq2l10vb7d1ydzi85j65bwzq1-system-path.drv’ 1 times on ‘ssh://localhost’ (needed by build 5622 and 0 others)
Jul 19 19:53:17 hydra hydra-queue-runner[2569086]: performing step ‘/nix/store/s29xqs7gzdbi1sl6zjwfacvqaf8k0czh-linux-6.6.40-modules-shrunk.drv’ 1 times on ‘ssh://localhost’ (needed by build 5622 and 0 others)
Jul 19 19:53:17 hydra hydra-queue-runner[2569086]: SSH stdout first line:
Jul 19 19:53:17 hydra hydra-queue-runner[2569086]: will disable machine ‘ssh://localhost’ for 1643s
Jul 19 19:53:17 hydra hydra-queue-runner[2569086]: SSH stdout first line:
Jul 19 19:53:17 hydra hydra-queue-runner[2569086]: possibly transient failure building ‘/nix/store/s29xqs7gzdbi1sl6zjwfacvqaf8k0czh-linux-6.6.40-modules-shrunk.drv’ on ‘ssh://localhost’: error: failed to start SSH connection to 'ssh://localhost'
Jul 19 19:53:17 hydra hydra-queue-runner[2569086]: will retry ‘/nix/store/s29xqs7gzdbi1sl6zjwfacvqaf8k0czh-linux-6.6.40-modules-shrunk.drv’ after 64s
Jul 19 19:53:17 hydra hydra-queue-runner[2569086]: possibly transient failure building ‘/nix/store/zbcxg8xhq2l10vb7d1ydzi85j65bwzq1-system-path.drv’ on ‘ssh://localhost’: error: failed to start SSH connection to 'ssh://localhost'

I doubt it should try to connect the local ssh service, since hydra itself runs at the same host, am I right?

I’ve set up local builder exactly like it’s recommended in Wiki:

    nix = {
        distributedBuilds = true;

        buildMachines = [
          {
            hostName = "localhost";
            system = "x86_64-linux";
            supportedFeatures = [ "kvm" "nixos-test" "big-parallel" "benchmark" ];
            maxJobs = 18;
          }

       # remote builders go here
    }

And it worked well until something happened. Is my configuration wrong? Did hydra itself change the way it connects to local builder without mentioning it in Wiki? I even checked my fail2ban to ensure it didn’t suddenly ban local client – not the case.

Tried to run command from “reproduce locally” on behalf of hydra-queue-runner user – it builds drv successfully. So it has something to do with how Hydra handles local builds.

I don’t know if you already fixed this, but since I ran into this post while fixing this myself…

Something seems to have changed where builders now default to SSH. To fix this, add protocol = null to the buildMachine definition. That way, Hydra won’t try to connect over SSH any more.

So in your example:

    nix = {
        distributedBuilds = true;

        buildMachines = [
          {
            hostName = "localhost";
            protocol = null;
            system = "x86_64-linux";
            supportedFeatures = [ "kvm" "nixos-test" "big-parallel" "benchmark" ];
            maxJobs = 18;
          }

       # remote builders go here
    }