Kernel panic, system freezes, no BSOD

I’ve been experiencing sporadic kernel panics on my laptop for the last couple of months. I have no idea, what could be causing them, or how to even start debugging this issue. It happens often enough to be annoying, but not often enough for me to be able to consistently reproduce it.

I think that it’s a kernel panic, because the CAPS LOCK key indicator starts flashing, and the system becomes completely unresponsive. I can’t switch to a VT, sysrq keys don’t do anything (I have verified that they are enabled and working under normal conditions) and even the power button doesn’t do anything unless I force a hardware reset by holding it for like 30 seconds. Unfortunately, I don’t get the Black Screen Of Death during the kernel panic (the laptop monitor and the external screen just freeze on the last frame of my desktop environment) and there is seemingly nothing suspicious in journalctl after a reboot (except that the logs just cut off at a random point, of course).

The first few times it happened was during a nixos-rebuild switch. However, this doesn’t seem to be a problem with any specific generation/configuration, because after that crash I was able to do a nixos-rebuild boot + manual reboot into that configuration with no problems. And just now it happened randomly during normal light use (watching YouTube).

How do you even debug something like this? After the last time it happened, I decided to set boot.kernel.sysctl."kernel.panic_on_oops" = 1 and boot.crashDump.enable = true in hopes that I might be able to catch the problem at an earlier stage or to display some kind of error instead of just freezing the system, but no luck so far.

I’ve seen this post about debugging kernel panics, but I wasn’t able to find any NixOS options or packages searching for kdump and I don’t know nearly enough about this topic to be able to adapt this guide from Debian to NixOS.

I think (?) that the new drm_panic kernel feature is supposed to help with such issues (it’s supposed to show kernel panic messages even without the fb console), but I have no idea which kernel version is it available in or how to enable it.

Any suggestions/recommendations?

1 Like

If you have an Nvidia GPU, this might be related to:

In which case you have to roll back the drivers to an earlier version until Nvidia fixes the issues.

1 Like

Thanks, I am indeed using the proprietary nvidia drivers, so I might try downgrading them.

However, I’d still prefer to have a way to see the kernel panic message or do a coredump of the kernel for debugging. Changing random things about my system until the crashes stop is a suboptimal solution, given that I can’t even verify that the problem is caused by the nvidia drivers.

The solution isn’t as random when the issues that you’ve described (random freezes, caps lock flashing, …) are too close to the current problem to be a coincidence. Knowing that you use the proprietary drivers only confirms that.

As to how you can debug things further, I honestly have no idea. I’ve been running the 550.78 drivers with hardware.nvidia.powerManagement.finegrained = true; under X11 and I didn’t have any problems anymore, so I stopped messing with it. Some users reported that rolling back the drivers works, so you’re free to try that as well.

The script, which would probably help you collect more information is missing from NixOS, as well, which can make debugging harder.

That being said, if you’d still like to help, you should probably follow this thread on the Nvidia forum:

The solution isn’t as random when the issues that you’ve described (random freezes, caps lock flashing, …) are too close to the current problem to be a coincidence.

I’ve been having this issue for a long time and just happened to decide to post it after the latest crash. If I am reading the nixpkgs git history correctly, linuxPackages.nvidiaPackages.latest got updated to the 550 driver line around march 3rd this year and I am ~90% sure that my first kernel panic predates that (I think it was in early february).

Also, a “random freeze” and flashing caps lock key are just the symptoms of a kernel panic (AFAIK). Additionally, my experience doesn’t perfectly match with what other users are describing. Supposedly, these people are getting kernel panics quite frequently/consistently and primarily under some load (installing packages). In my case, the crashes are relatively rare (it’s been happening for moths now, and I’ve only now been annoyed enough to make this thread) and have happened even during relatively idle operation.

Finally, even if this particular kernel panic was caused by the nvidia drivers, the fact that the kernel panic message isn’t displayed for some reason would still be an issue that I’d like to fix. I’m not necessarily blaming NixOS for this, but I know for a fact that you can get kernel panics to display properly (you can see the BlackSOD photos provided by people in the nvidia forums and I think I vaguely remember it working on my laptop back when I was daily driving Arch).

1 Like

I just shared what I know. Good luck in all cases and I hope you solve your issues.