Notebook crashes when disconnecting from Wifi

I noticed that my system was very unstable with NixOS 23.11 and I wasn’t really able to find out what it was.
It just randomly got stuck and was very hard to kill.
I reverted back to 23.05 and everything was stable again.
Out of habit it updated my flake on 23.05 and had the same problems again.
The most obvious change was that the kernel got updated to 6.1.65, the same version as in NixOS 23.11.

If someone has some tips on how to debug this problem I would be very glad.
I haven’t seen anything obvious in journalctl or dmesg so far but maybe someone knows better what to look for.

Edit: Looks like the kernel 6.6 doesn’t work with Nvidia ATM.
Trying to see my system is stable with 6.5 but would still prefer to have a working 6.1.
Edit2:
So far it looks like the kernel 6.5 works much better but will have to test a bit longer.

https://www.reddit.com/r/NixOS/comments/17lcp1j/nixos_update_on_unstable_stopping_on_nvidia_driver/

I tried to use the latest Kernel 6.6 but that fails with an Nvidia problem, logs below.

building the system configuration...
error: builder for '/nix/store/g9kb044rwk68k63v95f4skdy4br6jrqx-nvidia-x11-535.86.05-6.6.1.drv' failed with exit code 2;
       last 10 log lines:
       >  1371 |     .prime_fd_to_handle     = drm_gem_prime_fd_to_handle,
       >       |                               ^~~~~~~~~~~~~~~~~~~~~~~~~~
       > make[4]: *** [/nix/store/mq94rrv6zl0n7kz5k83nhjr7ajmw7nhs-linux-6.6.1-dev/lib/modules/6.6.1/source/scripts/Makefile.build:243: /build/kernel/nvidia-drm/nvidia-drm-drv.o] Error 1
       > make[4]: *** Waiting for unfinished jobs....
       > make[3]: *** [/nix/store/mq94rrv6zl0n7kz5k83nhjr7ajmw7nhs-linux-6.6.1-dev/lib/modules/6.6.1/source/Makefile:1913: /build/kernel] Error 2
       > make[2]: *** [/nix/store/mq94rrv6zl0n7kz5k83nhjr7ajmw7nhs-linux-6.6.1-dev/lib/modules/6.6.1/source/Makefile:234: __sub-make] Error 2
       > make[2]: Leaving directory '/nix/store/mq94rrv6zl0n7kz5k83nhjr7ajmw7nhs-linux-6.6.1-dev/lib/modules/6.6.1/build'
       > make[1]: *** [Makefile:234: __sub-make] Error 2
       > make[1]: Leaving directory '/nix/store/mq94rrv6zl0n7kz5k83nhjr7ajmw7nhs-linux-6.6.1-dev/lib/modules/6.6.1/source'
       > make: *** [Makefile:82: modules] Error 2
       For full logs, run 'nix log /nix/store/g9kb044rwk68k63v95f4skdy4br6jrqx-nvidia-x11-535.86.05-6.6.1.drv'.
error: 1 dependencies of derivation '/nix/store/9q1q5dyqna68r5cballvym266schglig-etc.drv' failed to build
error: 1 dependencies of derivation '/nix/store/a4i50qwsqp6qd6jncp6f7zqrb80fk96p-linux-6.6.1-modules.drv' failed to build
error: 1 dependencies of derivation '/nix/store/6zif8wk1w2k03iw2z688nf0yg0ss4zfl-system-path.drv' failed to build
error: 1 dependencies of derivation '/nix/store/h3ys8rlqwjigny9fc9vpk0divr055vrw-nixos-system-gwyn-23.05.20231119.0c5678d.drv' failed to build

which was the last “good” kernel version on 23.05 branch? just to get the span.

Is your underlying issue perhaps Tracking Linux stable ext4 data corruption bug in kernel < 6.1.66 · Issue #273375 · NixOS/nixpkgs · GitHub ? It doesn’t seem to have fully landed outside nixos-small just yet, but the fix would be updating to 6.1.66: https://nixpk.gs/pr-tracker.html?pr=272872

1 Like

6.1.62 was working fine.

Maybe but I don’t really know.
The system just becomes unresponsive very quickly.
First I thought it might be the usual desktop/graphics problems but happened as well when I didn’t login.

The system just becomes unresponsive very quickly

The question is, I’d say, if this is caused by the kernel at all. You could try and set your boot.kernelPackages to the last good version, update the rest, and see the result.

I’ve changed to the kernel 6.5 for the moment and it’s stable since a few hours.
However I hadn’t had the time yet to fully test it with reboots, undocking, etc.

1 Like

Okay it happened on 6.5 as well and on the latest 6.1.67 as well.
I can reproduce the problem as follows:

  1. Start the notebook connected to ethernet.
  2. Enable and and connect Wifi.
  3. Try to disable Wifi
  4. After a few seconds the notebook locks up.

I can press the power button but it never really powers off but hangs at stopping the network interfaces.

I found this in the logs: INFO: task kworker/11:0:71 blocked for more than 122 seconds.

14 20:24:55 gwyn picom[3817]: Xlib: ignoring invalid extension event 146
Dec 14 20:24:55 gwyn picom[3817]: Xlib: ignoring invalid extension event 146
Dec 14 20:24:55 gwyn kernel: wlp59s0: deauthenticating from 20:e5:2a:6a:08:46 by local choice (Reason: 3=DEAUTH_LEAVING)
Dec 14 20:24:55 gwyn wpa_supplicant[5442]: wlp59s0: CTRL-EVENT-DISCONNECTED bssid=20:e5:2a:6a:08:46 reason=3 locally_generated=1
Dec 14 20:24:55 gwyn kernel: iwlwifi 0000:3b:00.0: RF_KILL bit toggled to disable radio.
Dec 14 20:24:55 gwyn kernel: iwlwifi 0000:3b:00.0: reporting RF_KILL (radio disabled)
Dec 14 20:25:58 gwyn picom[3817]: Xlib: ignoring invalid extension event 161
Dec 14 20:25:58 gwyn picom[3817]: Xlib: ignoring invalid extension event 161
Dec 14 20:25:58 gwyn picom[3817]: Xlib: ignoring invalid extension event 161
Dec 14 20:25:58 gwyn picom[3817]: Xlib: ignoring invalid extension event 161
Dec 14 20:26:03 gwyn systemd-timesyncd[1601]: Contacted time server 10.7.89.1:123 (10.7.89.1).
Dec 14 20:26:03 gwyn systemd-timesyncd[1601]: Initial clock synchronization to Thu 2023-12-14 20:26:03.972188 CET.
Dec 14 20:26:05 gwyn picom[3817]: Xlib: ignoring invalid extension event 161
Dec 14 20:26:05 gwyn picom[3817]: Xlib: ignoring invalid extension event 161
Dec 14 20:26:06 gwyn picom[3817]: Xlib: ignoring invalid extension event 146
Dec 14 20:26:06 gwyn picom[3817]: Xlib: ignoring invalid extension event 146
Dec 14 20:26:08 gwyn picom[3817]: Xlib: ignoring invalid extension event 161
Dec 14 20:26:08 gwyn picom[3817]: Xlib: ignoring invalid extension event 161
Dec 14 20:26:08 gwyn picom[3817]: Xlib: ignoring invalid extension event 146
Dec 14 20:26:08 gwyn picom[3817]: Xlib: ignoring invalid extension event 146
Dec 14 20:26:09 gwyn picom[3817]: Xlib: ignoring invalid extension event 161
Dec 14 20:26:09 gwyn picom[3817]: Xlib: ignoring invalid extension event 161
Dec 14 20:26:14 gwyn picom[3817]: Xlib: ignoring invalid extension event 146
Dec 14 20:26:14 gwyn picom[3817]: Xlib: ignoring invalid extension event 146
Dec 14 20:26:16 gwyn picom[3817]: Xlib: ignoring invalid extension event 161
Dec 14 20:26:16 gwyn picom[3817]: Xlib: ignoring invalid extension event 161
Dec 14 20:26:16 gwyn picom[3817]: Xlib: ignoring invalid extension event 161
Dec 14 20:26:16 gwyn picom[3817]: Xlib: ignoring invalid extension event 161
Dec 14 20:26:46 gwyn picom[3817]: Xlib: ignoring invalid extension event 161
Dec 14 20:26:46 gwyn picom[3817]: Xlib: ignoring invalid extension event 161
Dec 14 20:27:32 gwyn kernel: INFO: task kworker/11:0:71 blocked for more than 122 seconds.
Dec 14 20:27:32 gwyn kernel:       Tainted: P     U     O       6.1.67 #1-NixOS
Dec 14 20:27:32 gwyn kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 14 20:27:32 gwyn kernel: task:kworker/11:0    state:D stack:0     pid:71    ppid:2      flags:0x00004000
Dec 14 20:27:32 gwyn kernel: Workqueue: events_power_efficient reg_check_chans_work [cfg80211]
Dec 14 20:27:32 gwyn kernel: Call Trace:
Dec 14 20:27:32 gwyn kernel:  <TASK>
Dec 14 20:27:32 gwyn kernel:  __schedule+0x31d/0x1240
Dec 14 20:27:32 gwyn kernel:  ? update_load_avg+0x7e/0x780
Dec 14 20:27:32 gwyn kernel:  schedule+0x5a/0xd0
Dec 14 20:27:32 gwyn kernel:  schedule_preempt_disabled+0x11/0x20
Dec 14 20:27:32 gwyn kernel:  __mutex_lock.constprop.0+0x399/0x700
Dec 14 20:27:32 gwyn kernel:  ? psi_task_switch+0xd2/0x230
Dec 14 20:27:32 gwyn kernel:  reg_check_chans_work+0x2d/0x5e0 [cfg80211]
Dec 14 20:27:32 gwyn kernel:  ? __schedule+0x325/0x1240
Dec 14 20:27:32 gwyn kernel:  ? add_timer_on+0xed/0x130
Dec 14 20:27:32 gwyn kernel:  process_one_work+0x1c4/0x380
Dec 14 20:27:32 gwyn kernel:  worker_thread+0x4d/0x380
Dec 14 20:27:32 gwyn kernel:  ? rescuer_thread+0x3a0/0x3a0
Dec 14 20:27:32 gwyn kernel:  kthread+0xd7/0x100
Dec 14 20:27:32 gwyn kernel:  ? kthread_complete_and_exit+0x20/0x20
Dec 14 20:27:32 gwyn kernel:  ret_from_fork+0x1f/0x30
Dec 14 20:27:32 gwyn kernel:  </TASK>
Dec 14 20:27:32 gwyn kernel: INFO: task kworker/0:2:89 blocked for more than 122 seconds.
Dec 14 20:27:32 gwyn kernel:       Tainted: P     U     O       6.1.67 #1-NixOS
Dec 14 20:27:32 gwyn kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 14 20:27:32 gwyn kernel: task:kworker/0:2     state:D stack:0     pid:89    ppid:2      flags:0x00004000

Definitely sounds like a kernel/wireless driver bug. If you can find out what your networking hardware is, you can probably trawl the internet/KML and figure out what the issue is, and if/in which kernel it was fixed. Probably easier to figure out whether to switch to a different kernel from there.

If you want to just try 6.6, try switching to a kernel from unstable. Nvidia have likely fixed that mismatch by now, but the driver version which does so has not been backported to NixOS stable.

1 Like

I found a mention of this issue by someone else and even Linus on the mailing list but it should be fixed in newer kernels .
For the moment I disable my Wifi card in the bios so at least I have a working system.

https://lkml.org/lkml/2020/6/6/51
https://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211.git/commit/?id=79ea1e12c0b8540100e89b32afb9f0e6503fad35

I had something similar. I upgraded from 23.05 straight to 24.05 because I plugged in nixos-unstable nixos and nixpkgs-unstable nixos channels. The system kernel updated to 6.1.63 and then to 6.1.64. After that the system started to run jerkily and freeze slightly. I tried the latest kernel 6.6.5, 6.6.6 then 6.6.7. The situation improved a bit, but when shutting down rfkill tried to shut down wpa_supplicant and network manager , even though in my configuration.nix these services are commented out because I use Gnome. It’s about a rather weak device, instead of customizing the video driver, I decided to change the kernel.
boot.kernelPackages = pkgs.linuxPackages-rt_latest solved the problem.

I don’t know what changed but it is working again on kernel 6.1.77.