This is a duplicate of the issue I posted on the colmena github, but I thought maybe someone here has an idea why this is happening.
I have a server that has one partition with the entire Nixos installation on Linode, using GRUB as the boot manager. The nix version is 2.24.11. This is the hardware-configuration:
# Do not modify this file! It was generated by ‘nixos-generate-config’
# and may be overwritten by future invocations. Please make changes
# to /etc/nixos/configuration.nix instead.
{ config, lib, pkgs, modulesPath, ... }:
{
imports =
[ (modulesPath + "/profiles/qemu-guest.nix")
];
boot.initrd.availableKernelModules = [ "virtio_pci" "virtio_scsi" "ahci" "sd_mod" ];
boot.initrd.kernelModules = [ ];
boot.kernelModules = [ ];
boot.extraModulePackages = [ ];
boot = {
kernelParams = [ "console=ttyS0,19200n8" ];
loader = {
grub = {
forceInstall = true;
extraConfig = ''
serial --speed=19200 --unit=0 --word=8 --parity=0 --stop=1;
terminal_input serial;
terminal_output serial
'';
device = "nodev";
};
timeout = 10;
};
};
fileSystems."/" =
{ device = "/dev/sda";
fsType = "ext4";
};
swapDevices =
[ { device = "/dev/sdb";}
];
# Enables DHCP on each ethernet and wireless interface. In case of scripted networking
# (the default) this is the recommended approach. When using systemd-networkd it's
# still possible to use this option, but it's recommended to use it in conjunction
# with explicit per-interface declarations with `networking.interfaces.<interface>.useDHCP`.
networking.useDHCP = lib.mkDefault true;
# networking.interfaces.enp0s5.useDHCP = lib.mkDefault true;
nixpkgs.hostPlatform = lib.mkDefault "x86_64-linux";
}
Colmena deployments began with Nixos 24.05 this week I updated the channel to 24.11. Now after deploying with apply switch
, the deployments hangs either at activating system profile or at starting the systemd services I configured. After a long wait (2hrs) I Ctrl-C the deployment. When I reboot the server, the bootloader is corrupted. It seems that the kernel has been updated from 6.6.63 to 6.6.72 over a few configurations. But that did not cause an issue before…
I am shown this message:
kbd_mode: KDSKBMODE: Inappropriate ioctl for device
starting device mapper and LVM...
File descriptor 8 (/dev/console) leaked on lvm invocation. Parent PID 1: /nix/sh
File descriptor 9 (/dev/console) leaked on lvm invocation. Parent PID 1: /nix/sh
checking /dev/sda...
fsck (busybox 1.36.1)
[fsck.ext4 (1) -- /mnt-root/] fsck.ext4 -a /dev/sda
fsck.ext4: Bad magic number in super-block while trying to open /dev/sda
/dev/sda:
The superblock could not be read or does not describe a valid ext2/ext3/ext4
filesystem. If the device is valid and it really contains an ext2/ext3/ext4
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
e2fsck -b 8193 <device>
or
e2fsck -b 32768 <device>
/dev/sda contains a swap file system labelled 'linode-swap'
fsck on /dev/sda failed.
An error occurred in stage 1 of the boot process, which must mount the
root filesystem on `/mnt-root' and then start stage 2. Press one
of the following keys:
r) to reboot immediately
*) to ignore the error and continue
After reinstalling the bootloader with the following commands:
for i in dev proc sys; do mount --rbind /$i /mnt/$i; done
chroot /mnt /nix/var/nix/profiles/system/bin/switch-to-configuration boot --install-bootloader
and then rebooting, everything seems to work fine.
Thinking this might be Colmena related, I will try deploying with deploy-rs to see if that make a difference.
Is there a way that I can set the server up again so that deployments function like they used to?