moving to NixOS 20.11 and from Linux 5.10 to Linux 5.15 I started to experience many problems that I think can be summarized with:
systemd[1]: NetworkManager.service: State 'stop-sigterm' timed out. Killing.
systemd[1]: NetworkManager.service: Killing process 1953 (NetworkManager) with signal SIGKILL.
systemd[1]: NetworkManager.service: Processes still around after SIGKILL. Ignoring.
I see this happening when I switch off as well. The shutdown process never completes.
All the commands I run with sudo hangs and systemd tells me that the system is degraded
I didn’t try to revert to previous kernel yet. If you have suggestions or you want me to try something else let me know! Thanks
doesn’t nixos detect a kernel change and warn the user they should ‘reboot’ or do a kool kids ‘kexec’ to get the new kernel. If it doesn’t it should :-).
It could certainly be detected whether a reboot is necessary. In fact, this logic is already implemented as part of the system.autoUpgrade.allowReboot option:
kexec is not really useful in this context. It also requires terminating all userspace processes to replace the running kernel and might cause issues with hardware devices that do not reinitialize correctly.
Are there any other log messages? The ones you posted only state that NetworkManager was killed after timeout which is only marginally useful. The output of this would be a good start:
with -1 it does not tell me much because I used -2 and I got this:
$ journalctl -b -2 -u NetworkManager -f
-- Journal begins at Wed 2021-09-01 10:29:25 CEST. --
Dec 06 17:24:01 systemd[1]: Stopping Network Manager...
Dec 06 17:25:31 systemd[1]: NetworkManager.service: State 'stop-sigterm' timed out. Killing.
Dec 06 17:25:31 systemd[1]: NetworkManager.service: Killing process 1953 (NetworkManager) with signal SIGKILL.
Dec 06 17:27:01 systemd[1]: NetworkManager.service: Processes still around after SIGKILL. Ignoring.
Dec 06 17:28:32 systemd[1]: NetworkManager.service: State 'final-sigterm' timed out. Killing.
Dec 06 17:28:32 systemd[1]: NetworkManager.service: Killing process 1953 (NetworkManager) with signal SIGKILL.
Dec 06 17:30:02 systemd[1]: NetworkManager.service: Processes still around after final SIGKILL. Entering failed mode.
Dec 06 17:30:02 systemd[1]: NetworkManager.service: Failed with result 'timeout'.
Dec 06 17:30:02 systemd[1]: NetworkManager.service: Unit process 1953 (NetworkManager) remains running after unit stopped.
I get a good amount of those that I am not sure what :
Dec 07 10:26:53 huge kernel: INFO: task wpa_supplicant:2168 blocked for more than 122 seconds.
Dec 07 10:26:53 huge kernel: Tainted: P O 5.15.4 #1-NixOS
Dec 07 10:26:53 huge kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 07 10:26:53 huge kernel: task:wpa_supplicant state:D stack: 0 pid: 2168 ppid: 1 flags:0x00000004
Dec 07 10:26:53 huge kernel: Call Trace:
Dec 07 10:26:53 huge kernel: <TASK>
Dec 07 10:26:53 huge kernel: __schedule+0x2dd/0x1480
Dec 07 10:26:53 huge kernel: ? ieee80211_iterate_active_interfaces_atomic+0xd/0x20 [mac80211]
Dec 07 10:26:53 huge kernel: ? __nla_validate_parse+0x5f/0xc00
Dec 07 10:26:53 huge kernel: schedule+0x44/0xa0
Dec 07 10:26:53 huge kernel: schedule_preempt_disabled+0xa/0x10
Dec 07 10:26:53 huge kernel: __mutex_lock.constprop.0+0x258/0x480
Dec 07 10:26:53 huge kernel: nl80211_pre_doit+0x16/0x150 [cfg80211]
Dec 07 10:26:53 huge kernel: genl_family_rcv_msg_doit+0xd2/0x150
Dec 07 10:26:53 huge kernel: genl_rcv_msg+0xde/0x1d0
Dec 07 10:26:53 huge kernel: ? nl80211_flush_pmksa+0xe0/0xe0 [cfg80211]
Dec 07 10:26:53 huge kernel: ? genl_get_cmd+0xd0/0xd0
Dec 07 10:26:53 huge kernel: netlink_rcv_skb+0x50/0xf0
Dec 07 10:26:53 huge kernel: genl_rcv+0x24/0x40
Dec 07 10:26:53 huge kernel: netlink_unicast+0x201/0x2c0
Dec 07 10:26:53 huge kernel: netlink_sendmsg+0x22e/0x470
Dec 07 10:26:53 huge kernel: sock_sendmsg+0x5e/0x60
Dec 07 10:26:53 huge kernel: ____sys_sendmsg+0x1f7/0x230
Dec 07 10:26:53 huge kernel: ? sendmsg_copy_msghdr+0x7c/0xa0
Dec 07 10:26:53 huge kernel: ___sys_sendmsg+0x75/0xb0
Dec 07 10:26:53 huge kernel: ? __switch_to_asm+0x42/0x70
Dec 07 10:26:53 huge kernel: ? finish_task_switch.isra.0+0xa7/0x280
Dec 07 10:26:53 huge kernel: ? memcg_slab_free_hook+0xc7/0x180
Dec 07 10:26:53 huge kernel: ? __dentry_kill+0x132/0x170
Dec 07 10:26:53 huge kernel: ? __fput+0xf7/0x240
Dec 07 10:26:53 huge kernel: __sys_sendmsg+0x59/0xa0
Dec 07 10:26:53 huge kernel: do_syscall_64+0x3b/0x90
Dec 07 10:26:53 huge kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae
You see wpa_supplicant here but I saw others for a lot of different applications. For example for weechat. So I am not sure there is a connection with an application