Hi All,
I’ve been seeing some rather strange behavior with my remote-builder wherein when building any package that requires the kvm
feature, every open ssh connection to the remote builder is terminated.
Here’s is what I’ve been doing so far:
I use the NixOS 22.11 minimal ISO image to create a bootable USB, and boot up the system. Once booted, I run the script below, which installs NixOS to the local disk and reboots the system, when done.
#!/usr/bin/env bash
# Install NixOS on to a 2013 macmini host
set -o errexit
set -o nounset
set -o pipefail
# Need root privilege ot run
if [[ "$EUID" -gt 0 ]]; then
err "Must run as root"
exit 1
fi
disk="/dev/sda"
echo "Partitioning ${disk}..."
parted -s "${disk}" -- mklabel gpt
parted -s -a optimal "${disk}" -- mkpart ESP fat32 0% 512MiB
parted -s -a optimal "${disk}" -- mkpart primary 512MiB -0
parted "${disk}" -- set 1 boot on
echo "Waiting until the partitions are available in /dev..."
systemctl restart systemd-udev-trigger.service
until [[ -e "${disk}1" && -e "${disk}2" ]]; do sleep 1; done
echo "Creating filesystems on $disk..."
mkfs.fat -F 32 -n boot "${disk}1"
mkfs.ext4 -L nixos "${disk}2"
echo "Waiting until the filesystems are available in /dev..."
systemctl restart systemd-udev-trigger.service
until [[ -e /dev/disk/by-label/boot && -e /dev/disk/by-label/nixos ]]; do sleep 1; done
echo "Mounting filesystems..."
mount /dev/disk/by-label/nixos /mnt
mkdir -p /mnt/boot
mount /dev/disk/by-label/boot /mnt/boot
echo "Generating NixOS configuration (/mnt/etc/nixos/*.nix) ..."
nixos-generate-config --root /mnt
mv /mnt/etc/nixos/configuration.nix /mnt/etc/nixos/default-configuration.nix
echo "Writing custom NixOS configuration to /mnt/etc/nixos/ ..."
cat <<EOF >/mnt/etc/nixos/configuration.nix
{ pkgs, ...}:
{
imports = [
./hardware-configuration.nix
./default-configuration.nix
];
environment.systemPackages = with pkgs; [
coreutils
htop
less
vim
which
];
i18n.defaultLocale = "en_US.UTF-8";
# Needed for Broadcom drivers
nixpkgs.config.allowUnfree = true;
nix = {
gc = {
automatic = true;
dates = "weekly";
options = "--max-freed $((64 * 1024 ** 3))";
};
optimise = {
automatic = true;
dates = [ "weekly" ];
};
settings.trusted-users = [ "@nixbld" "@wheel" ];
};
security.sudo.wheelNeedsPassword = false;
services.openssh = {
enable = true;
permitRootLogin = "no";
passwordAuthentication = false;
hostKeys = [{
path = "/data/etc/ssh/ssh_host_ed25519_key";
type = "ed25519";
}];
};
time.timeZone = "UTC";
users = {
mutableUsers = false;
users = {
anand = {
isNormalUser = true;
createHome = true;
extraGroups = [ "nixbld" "sudo" "wheel" ];
group = "users";
uid = 1000;
home = "/home/anand";
useDefaultShell = true;
openssh.authorizedKeys.keys = [ "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIMozgKcmC5KdPFteZey9Ov45/inEfg/PCdSaZKd582tb" ];
};
root.hashedPassword = "\$6\$Y07SFMN5XPG6fJgw\$FGGBCduL4Bdg55vGoHAnQgDl5MqVqoIzgWlQoMXXrpm.2nmhTgivPJMNEpPclh064or/eM8.6GruCnFttZvPW0";
};
};
}
EOF
echo "Installing NixOS to /mnt ..."
nixos-install -I "nixos-config=/mnt/etc/nixos/configuration.nix" --no-root-passwd
echo "Installation succeeded! Rebooting ..."
reboot
Once the system reboots, I am able to log in over ssh using the user-account created by the script without any issue.
However, when I start running a build that uses the Macmini as the remote builder, something really odd seems to be happening. When connected over ssh, the connection (seemingly) randomly terminates. What’s even stranger is that its not just the remote-build ssh connection that terminates… EVERY open ssh connection to the remote builder is terminated, including those that have nothing to do with the remote build! E.g. I have two terminal windows open; one connected over ssh to the remote builder running some program (htop
, or journalctl -f
), while in the other I run a nix-build
command that is using the remote builder to build some package.
The following is a NixOS system closure I’m trying to build using the remote builder.
$ cat default.nix
let
pkgs = import ./nix { system = "x86_64-linux"; };
in
{
macmini1 = pkgs.nixos [({ lib, pkgs, ... }: {
imports = [
./nix/modules/remote-builder.nix
./nix/modules/ssh.nix
./nix/modules/users.nix
];
boot.loader.grub.device = "/dev/disk/by-label/boot";
fileSystems = {
"/" = { fsType = "ext4"; device = "/dev/disk/by-label/nixos";};
"/boot" = { fsType = "vfat"; device = "/dev/disk/by-label/boot"; };
};
networking.hostName = "macmini1";
swapDevices = [ ];
})];
}
$ nix-build -A macmini1
these 3 derivations will be built:
/nix/store/86f9riy4fhbx91xkkyrqka547yy7m0r8-nixos-boot-disk.drv
/nix/store/yhp44j5bgpcyv5qh5yvdbxjsknrvz9cm-run-nixos-vm.drv
/nix/store/77rq8smg20nr1j03scz1gm132l68vnvq-nixos-vm.drv
building '/nix/store/86f9riy4fhbx91xkkyrqka547yy7m0r8-nixos-boot-disk.drv' on 'ssh://macmini1'...
copying 0 paths...
Connection to 192.168.1.120 closed by remote host.
error: unexpected end-of-file
error: builder for '/nix/store/86f9riy4fhbx91xkkyrqka547yy7m0r8-nixos-boot-disk.drv' failed with exit code 1;
last 1 log lines:
> Connection to 192.168.1.120 closed by remote host.
For full logs, run 'nix log /nix/store/86f9riy4fhbx91xkkyrqka547yy7m0r8-nixos-boot-disk.drv'.
error: 1 dependencies of derivation '/nix/store/yhp44j5bgpcyv5qh5yvdbxjsknrvz9cm-run-nixos-vm.drv' failed to build
error: 1 dependencies of derivation '/nix/store/77rq8smg20nr1j03scz1gm132l68vnvq-nixos-vm.drv' failed to build
Of course, this makes it rather hard to get logs from a remote builder, and I am forced to log into the remote builder at the console to check the logs. This is what I see…
Jan 03 01:00:20 macmini1 sshd[3122]: Accepted publickey for anand from 192.168.1.201 port 55039 ssh2: ED25519 SHA256:j7ZZi4D7e8N+5oxe0VlEvGwksNuI1Ihq6yuwlmgljPA
Jan 03 01:00:20 macmini1 sshd[3122]: pam_unix(sshd:session): session opened for user anand(uid=1000) by (uid=0)
Jan 03 01:00:20 macmini1 systemd[1]: Starting User Runtime Directory /run/user/1000...
Jan 03 01:00:20 macmini1 systemd-logind[874]: New session 19 of user anand.
Jan 03 01:00:20 macmini1 systemd[1]: Finished User Runtime Directory /run/user/1000.
Jan 03 01:00:20 macmini1 systemd[1]: Starting User Manager for UID 1000...
Jan 03 01:00:20 macmini1 systemd[3125]: pam_unix(systemd-user:session): session opened for user anand(uid=1000) by (uid=0)
Jan 03 01:00:20 macmini1 systemd[3125]: Queued start job for default target Main User Target.
Jan 03 01:00:20 macmini1 systemd[3125]: Created slice User Application Slice.
Jan 03 01:00:20 macmini1 systemd[3125]: Reached target Paths.
Jan 03 01:00:20 macmini1 systemd[3125]: Reached target Timers.
Jan 03 01:00:20 macmini1 systemd[3125]: Starting D-Bus User Message Bus Socket...
Jan 03 01:00:20 macmini1 systemd[3125]: Listening on D-Bus User Message Bus Socket.
Jan 03 01:00:20 macmini1 systemd[3125]: Reached target Sockets.
Jan 03 01:00:20 macmini1 systemd[3125]: Reached target Basic System.
Jan 03 01:00:20 macmini1 systemd[1]: Started User Manager for UID 1000.
Jan 03 01:00:20 macmini1 systemd[3125]: Starting Run user-specific NixOS activation...
Jan 03 01:00:20 macmini1 systemd[1]: Started Session 19 of User anand.
Jan 03 01:00:20 macmini1 systemd[3125]: Finished Run user-specific NixOS activation.
Jan 03 01:00:20 macmini1 systemd[3125]: Reached target Main User Target.
Jan 03 01:00:20 macmini1 systemd[3125]: Startup finished in 95ms.
Jan 03 01:08:41 macmini1 sshd[3198]: Accepted publickey for anand from 192.168.1.201 port 55053 ssh2: ED25519 SHA256:j7ZZi4D7e8N+5oxe0VlEvGwksNuI1Ihq6yuwlmgljPA
Jan 03 01:08:41 macmini1 sshd[3198]: pam_unix(sshd:session): session opened for user anand(uid=1000) by (uid=0)
Jan 03 01:08:41 macmini1 systemd-logind[874]: New session 21 of user anand.
Jan 03 01:08:41 macmini1 systemd[1]: Started Session 21 of User anand.
Jan 03 01:08:41 macmini1 nix-daemon[2752]: accepted connection from pid 3201, user anand (trusted)
Jan 03 01:08:42 macmini1 sshd[3198]: pam_unix(sshd:session): session closed for user anand
Jan 03 01:08:42 macmini1 sshd[3122]: pam_unix(sshd:session): session closed for user anand
Jan 03 01:08:42 macmini1 systemd[1]: session-21.scope: Deactivated successfully.
Jan 03 01:08:42 macmini1 nix-daemon[3203]: unexpected Nix daemon error: error: writing to file: Broken pipe
Jan 03 01:08:42 macmini1 systemd[1]: user@1000.service: Main process exited, code=killed, status=9/KILL
Jan 03 01:08:42 macmini1 systemd[1]: user@1000.service: Failed with result 'signal'.
Jan 03 01:08:42 macmini1 systemd[1]: session-19.scope: Deactivated successfully.
Jan 03 01:08:42 macmini1 systemd-logind[874]: Session 21 logged out. Waiting for processes to exit.
Jan 03 01:08:42 macmini1 systemd[1]: Stopping User Runtime Directory /run/user/1000...
Jan 03 01:08:42 macmini1 systemd-logind[874]: Session 19 logged out. Waiting for processes to exit.
Jan 03 01:08:42 macmini1 systemd-logind[874]: Removed session 21.
Jan 03 01:08:42 macmini1 systemd-logind[874]: Removed session 19.
Jan 03 01:08:42 macmini1 systemd[1]: run-user-1000.mount: Deactivated successfully.
Jan 03 01:08:42 macmini1 systemd[1]: user-runtime-dir@1000.service: Deactivated successfully.
Jan 03 01:08:42 macmini1 systemd[1]: Stopped User Runtime Directory /run/user/1000.
I’ve tried digging into it some more with debug logs enabled for ssh, and haven’t been able to find a smoking gun yet. I have experienced this behavior on NixOS 21.11, 22.05 and now 22.11 on multiple machines. I’m hoping I’m not the only one seeing this.
Seems like this is only affecting builds that need the kvm
feature, so may that might be a hint.
Happy to post any additional details as required. Any/all help is appreciated.
Thanks