Kernel panic, system freezes, no BSOD

The solution isn’t as random when the issues that you’ve described (random freezes, caps lock flashing, …) are too close to the current problem to be a coincidence.

I’ve been having this issue for a long time and just happened to decide to post it after the latest crash. If I am reading the nixpkgs git history correctly, linuxPackages.nvidiaPackages.latest got updated to the 550 driver line around march 3rd this year and I am ~90% sure that my first kernel panic predates that (I think it was in early february).

Also, a “random freeze” and flashing caps lock key are just the symptoms of a kernel panic (AFAIK). Additionally, my experience doesn’t perfectly match with what other users are describing. Supposedly, these people are getting kernel panics quite frequently/consistently and primarily under some load (installing packages). In my case, the crashes are relatively rare (it’s been happening for moths now, and I’ve only now been annoyed enough to make this thread) and have happened even during relatively idle operation.

Finally, even if this particular kernel panic was caused by the nvidia drivers, the fact that the kernel panic message isn’t displayed for some reason would still be an issue that I’d like to fix. I’m not necessarily blaming NixOS for this, but I know for a fact that you can get kernel panics to display properly (you can see the BlackSOD photos provided by people in the nvidia forums and I think I vaguely remember it working on my laptop back when I was daily driving Arch).

1 Like

I just shared what I know. Good luck in all cases and I hope you solve your issues.

any updates on this???

The problem stoppped happening for me after I pinned the nvidia drivers to version 535.154.05, so it was indeed the same bug in the latest proprietary nvidia drivers.

According to this thread on the nvidia forums, this bug is unlikely to be fixed in the near future, because it is only present in the proprietary version of the drivers and starting with version 560 nvidia made the open source version the default and are seemingly ignoring any bug reports regarding the proprietary version.

So for now, your options are to pin a slightly older version of the driver like this

hardware.nvidia.package = config.boot.kernelPackages.nvidiaPackages.mkDriver {
  version = "535.154.05";
  sha256_64bit = "sha256-fpUGXKprgt6SYRDxSCemGXLrEsIA6GOinp+0eGbqqJg=";
  sha256_aarch64 = "sha256-G0/GiObf/BZMkzzET8HQjdIcvCSqB1uhsinro2HLK9k=";
  openSha256 = "sha256-wvRdHguGLxS0mR06P5Qi++pDJBCF8pJ8hr4T8O6TJIo=";
  settingsSha256 = "sha256-9wqoDEWY4I7weWW05F4igj1Gj9wjHsREFMztfEmqm10=";
  persistencedSha256 = "sha256-d0Q3Lk80JqkS1B54Mahu2yY/WocOqFFbZVBh+ToGhaE=";
};

or to switch to the open source drivers (hardware.nvidia.open = true;). Keep in mind, that I haven’t personally tested the open version of the driver, and there might be some performance/feature-parity issues with the open source version compared to the proprietary version.

2 Likes

i tried your snippet but the build is failing :cry:

make[3]: *** [/nix/store/hlhp0pprnyq96npl2k1b5z9iy2xc4c9q-linux-6.10-dev/lib/modules/6.10.0/source/Makefile:1934: /build/NVIDIA-Linux-x86_64-535.154.05/kernel] Error 2
make[2]: *** [/nix/store/hlhp0pprnyq96npl2k1b5z9iy2xc4c9q-linux-6.10-dev/lib/modules/6.10.0/source/Makefile:240: __sub-make] Error 2
make[2]: Leaving directory '/nix/store/hlhp0pprnyq96npl2k1b5z9iy2xc4c9q-linux-6.10-dev/lib/modules/6.10.0/build'
make[1]: *** [Makefile:240: __sub-make] Error 2
make[1]: Leaving directory '/nix/store/hlhp0pprnyq96npl2k1b5z9iy2xc4c9q-linux-6.10-dev/lib/modules/6.10.0/source'
make: *** [Makefile:82: modules] Error 2

The 6.10 kernel will not work. You must use an earlier version like boot.kernelPackages = pkgs.linuxPackages_6_9;

For an explanation, see Unable to build nix due to nvidia drivers due or kernel 6.10(?) - #5 by TLATER

It’s worth noting that this isn’t fully open source and will probably not solve the issue according to the Nvidia forum. The fully open source driver is NVK and using it will probably solve this, but you won’t be able to use CUDA anymore.

Note: Under 555.58.02, it hasn’t been happening to me as much, lately, but it still occurs nonetheless.

I am almost certain that nvidia-open in the forums and hardware.nvidia.open = true in NixOS refer to the same thing (the open source kernel space driver, not the userspace drivers/NVK). It makes sense that switching to the open source version of the kernel modules would fix the issue, because kernell panics, well… happen in the kernel, not in the userspace (although they certainly can be caused by something in the userspace).

While I haven’t personally verified this, there are people in the abovementioned nvidia forum thread that report running nvidia-open driver versions 550 and 555 for weeks without any freezes. There are some feature parity/performance issues in nvidia-open, but they are relatively minor compared to NVK (in my experience) and you will be eventually forced to eventually switch to nvidia-open anyway, since nvidia is no longer planning to develop the proprietary version of the kernel space drivers.

I’m aware. I just wanted to make the distinction that they’re only open kernel modules because calling it open source drivers would give the impression that it’s all open source, while it’s not.

It’s definitely better and I rarely have it now compared to before, but it still happens. Unless that’s related to something else, though.

If it’s more stable, I don’t really care about a slight drop in performance. NVK was good when I tried it, but the only thing keeping me from switching is CUDA.

Indeed, so let’s hope that things turn out for the better going forward. Else, my next GPU definitely won’t be an Nvidia :upside_down_face:

using nvidia-open changes nothing for me, still facing a kernel panic after just about every boot,

instantly, after gdm login, after 5 min. same thing over and over.

couldnt find anything anywhere either. really frustrated.
sometime i feel like wanting to switch back to arch and fk nvidia and code in peace but i dont wanna leave after having spent months on mmy nixos-config.
really in a tough spot here.

You could try running the open-source nouveau drivers, instead. Just remove the Nvidia configuration and they should be enabled by default.

do they work properly on wayland? i just wanna play some games man

This bug is not NixOS-specific. You would almost certainly have exactly the same issues on arch.

As I mentioned previously in the thread, pinning drivers to version 535.154.05 should fix the issue (assuming your problem is indeed the same as everybody else).

2 Likes

i was unable to pin it to 535.154.05, could you share your complete config please?

Did you try to pin the kernel to an earlier version as well?

i tried but it gave some vague error i couldnt fix so i dropped it for somewhille,
today however i couldnt get nothing done because the stuff is acting up too much

Which GPU are you using, exactly?

i have an RTX 3050 Mobile on an Asus Tuf A17 with R7-4800H and 16G

Do you mean the build error with the 6.10 kernel? Are you perhaps setting boot.kernelPackages = pkgs.linuxPackages_latest or something similar in your config?

Afaik, boot.kernelPackages should default to pkgs.linuxPackages, which is an alias for pkgs.linuxKernel.packageAliases.linux_default which should be kernel version 6.6, not 6.10.

If that isn’t the issue you are experiencing, please provide more information beyond “some vague error i couldnt fix”.

Your GPU is fairly recent, so it shouldn’t be a problem. What’s your current hardware configuration?

Also, can you collect more information about this? Perhaps use the nvidia-bug-report.sh or get the previous system logs with journalctl -b -1

i did set the kernel to _6_9 as you suggested, it just got built but i was unable to login to gnome --_–