NixOS freezing intermittently after update to 24.05

Summary

I’m hoping someone can help me, I’ve exhausted my problem solving capability and need fresh ideas to diagnose and ideally solve these issues. I’ve listed a variety of issues below that all started after the 24.05. Before you assume it’s NVIDIA drivers, please read all the issues and note that I’ve attempted to switch and reconfigure video drivers. I could, of course, be missing something.

Device

I’ve been running NixOS on my Razer 14" laptop for about two years and this is the first time there has ever been any issues I couldn’t resolve with basic changes to my Nix configuration.

Please see my configuration on GitHub GitHub - sum-rock/just-sum-nix: nix files. This is regarding the host named “razer”.

OS: NixOS 24.05.20240719.0c53b6b (Uakari) x86_64
Host: Razer PI411
Kernel: 6.6.41
Uptime: 20 mins
Packages: 2276 (nix-system), 414 (nix-user)
Shell: fish 3.7.1
Resolution: 2560x1440
DE: GNOME 46.2 (Wayland)
Theme: shell-Teal-Dark [GTK2/3]
Icons: Adwaita [GTK2/3]
Terminal: tmux
CPU: AMD Ryzen 9 5900HX with Radeon Graphics (16) @ 4.890GHz
GPU: NVIDIA GeForce RTX 3080 Mobile / Max-Q 8GB/16GB
GPU: AMD ATI Radeon Vega Series / Radeon Vega Mobile Series
Memory: 3413MiB / 15395MiB

Issue

There is a range of issues and I’m not sure if they’re related or not. They all started after the upgrade to 24.05. Issues are as follows.

  • Intermittent freezing:
    • It’s very hard for me to find a pattern to when this happens. It may (or may not) be more likely when I switch between GNOME desktops. I’ve also left a terminal process running, walked away, and came back to a frozen screen.
    • When the computer freezes, I cannot access TTY, the mouse doesn’t move, keyboard doesn’t respond, and my only option is a forced shutdown.
    • I’ve combed through the journalctl logs for the time around when this happens and cannot find anything out of the ordinary. The logs just stop when the freeze happens.
  • Cannot play local video on VLC:
    • When I attempt to play video’s on VLC there is audio and no video.
    • I’ve played with the video codex settings and everything there looks normal.
  • Random sleep and occasional problems waking from sleep
    • My favorite issue is when my computer just decides it’s time to sleep. I think this is most common if my completer has gone to sleep and then woken up successfully. If I’m on the same session I was on after the last start up, I don’t think the random sleeping is an issue.
    • I’ve also had the classic refusing to wake from sleep issue. Not much else to say on this one
  • Inability to Shutdown or Restart
    • This one is the most odd to me. When I attempt a restart or a shutdown from the GNOME desktop I’ll get the normal terminal screen with the kernel shutdown process but at some point it will hang. I have not seen a pattern to when it hangs but I’m not confident about that lack of pattern. When it hangs, my only option is a forced shutdown.
  • (probably not related) Orphaned pointers in storage drive
    • I’m pretty sure this is just a result of constantly having to do forced shut downs. At one point I couldn’t boot the system because there were issues with the file system. I had to mount a recovery drive and run fsck to repair the main partition.

Attempted Solutions

  • By this point you’re probably thinking NVIDIA drivers are the issue. That’s what I’ve assumed too. (Although, some of the issues around the terminal screen hanging on shutdown don’t seem to fit with my understanding of those issues). I’ve attempted to just go to bare basics with the config and I’ve switched between the production version of the NVIDIA driver and then back to stable and then back to production. Nothing I’ve tried seems to make a difference.
  • I’ve regenerated my `hardware_configuration.nix` file to make sure there aren’t version differences related to anything. No results here.
  • I’ve been stalking journalctl logs and have not come up with anything obvious. I don’t even know where to look anymore. I cannot see where this issue is originating from.
5 Likes

Can you keep a terminal open with dmesg -w and another with watch -n 1 free -h to catch any potential kernel logs/OOM scenarios?

Absolutely. Doing that now. Good thoughts although I’d be shocked if I’m OOMing with 16G. It may take a bit to get an event to log. The issues not very reproducible.

1 Like

Okay. Well. I tried to get the logs from dmesg -w by running dmesg -w > /Documents/logs.txt and checking it after a forced shutdown when I freeze. I’m not seeing anything notable. This could be because the file isn’t capturing things at the time of the freeze though…

I have started to see a consistent output when I try to shutdown or restart. I no longer have any successful shutdown or restart attempts from Gnome or the terminal. I’m not totally sure how to get this in text form but here is a picture. I’m still very very stumped on this one.

Those are the last few lines of a kernel panic, I’m pretty sure. Any chance your caps lock is flashing?

That’d be why you need to force it off, as well, and could very well cause file system issues.

This could be any number of things, of course. I’ve had a faulty (internal) keyboard wire cause random panics before. For now, share a full journalctl --boot -1, and maybe try down/upgrading the kernel.

1 Like

Wonderful. I can work with that. I’ll try and pin an older kernel. Haven’t done that in NixOS yet but I’m sure I can figure something out. My caps lock is not flashing, no.

Here is a pastebin link for the journal logs on my last boot.

Also, I really really appreciate your help; lending me some of your knowledge. Thank you.

Nothing obvious from that, a log from a boot that ended in a freeze might help a bit more, but

Jul 31 18:40:00 razer systemd-coredump[3340]: Process 3264 (.nextcloud-wrap) of user 1000 dumped core.

why are you running a wrapped nextcloud binary? Does nextcloud even have binaries?

It’s pretty easy on NixOS: boot.kernelPackages

consider also trying a new kernel. not all bugfixes get backported. boot.kernelPackages = pkgs.linuxPackagesFor pkgs.linux_latest;

2 Likes

Nextcloud wrapped is the nextcloud desktop client. Runs as a home manager service.

1 Like

Hmmm any recommendations here for the latest kernel? I can move to not latest but I’m not sure if I’ve done something stupid here.

building the system configuration...
error: builder for '/nix/store/q9q986hhv6m1mh0b1h0cfmpmwnz04fv1-nvidia-x11-550.78-6.10.1.drv' failed with exit code 2;
       last 10 log lines:
       > /build/NVIDIA-Linux-x86_64-550.78/kernel/common/inc/nv-linux.h: In function 'nv_vmap':
       > /build/NVIDIA-Linux-x86_64-550.78/kernel/common/inc/nv-linux.h:674:51: warning: suggest braces around empty body in an 'if' statement [8;;https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wempty-body-Wempty-body8;;]
       >   674 |         NV_MEMDBG_ADD(ptr, page_count * PAGE_SIZE);
       >       |                                                   ^
       > make[3]: *** [/nix/store/w5wdn5mhhsibv38h88g51giqg6khfrsp-linux-6.10.1-dev/lib/modules/6.10.1/source/Makefile:1934: /build/NVIDIA-Linux-x86_64-550.78/kernel] Error 2
       > make[2]: *** [/nix/store/w5wdn5mhhsibv38h88g51giqg6khfrsp-linux-6.10.1-dev/lib/modules/6.10.1/source/Makefile:240: __sub-make] Error 2
       > make[2]: Leaving directory '/nix/store/w5wdn5mhhsibv38h88g51giqg6khfrsp-linux-6.10.1-dev/lib/modules/6.10.1/build'
       > make[1]: *** [Makefile:240: __sub-make] Error 2
       > make[1]: Leaving directory '/nix/store/w5wdn5mhhsibv38h88g51giqg6khfrsp-linux-6.10.1-dev/lib/modules/6.10.1/source'
       > make: *** [Makefile:85: modules] Error 2
       For full logs, run 'nix log /nix/store/q9q986hhv6m1mh0b1h0cfmpmwnz04fv1-nvidia-x11-550.78-6.10.1.drv'.
error: 1 dependencies of derivation '/nix/store/p59jx8lmxsaykn85bzdjf5p64mapi31r-etc.drv' failed to build
error: 1 dependencies of derivation '/nix/store/ivpqli53grdjylsm1ybhcp81b09xsxss-linux-6.10.1-modules.drv' failed to build
error: 1 dependencies of derivation '/nix/store/w6pwd2f5blxx12clx0dgjpc7z0ygqm4b-system-path.drv' failed to build
error: 1 dependencies of derivation '/nix/store/4x6mcf5v1gyr472ngb5f64f5s67wp6c5-nixos-system-razer-24.05.20240727.8c50662.drv' failed to build

i’d still try pkgs.linux_6_9, or linux_6_8 failing that. i’m sure there are people running 6.10 on nvidia, but i don’t know what would be required for that

1 Like

I got 6_8 to build, no issue (yet). I’ll play around with this and see if I’m still getting freezing or random sleeps. I’m actually surprised at how much easier its to pin a kernel in NixOS compared to Ubuntu based distros. I suppose I shouldn’t be surprised at these thing at this point.

2 Likes

I want to chime in here and say that after my latest nixos-unstable bump (9f4128e00b0ae8ec65918efeba59db998750ead6 2024-07-03), I have been experiencing relatively frequent crashes as well. Sometimes these happen as I’m using my system, and other times it happens after waking up from sleep.

And yes, my Caps Lock is flashing.

I will be updating within the next few days to see if it resolves the issue. I’m using boot.kernelPackages = pkgs.linuxPackages_latest; so the kernel should definitely bump from 6.9.7 to something newer.

1 Like

Your issue is unlikely to be related to this one but try to revert the latest kernel bump and see if that fixes it. If yes, this is a kernel bug that was backported and the kernel devs would likely want to know about that.

2 Likes

This is since you’re already using latest - your issue is much more likely to not have been fixed upstream yet, and upstream might not know this.

1 Like

I am pretty new to nixos I use it mostly for running some docker containers.

First time when I started it was up and running for almost 50+ days without any issues but from last few weeks it just freezes every 2-3 days. caps lock and num lock works but nothing else.
I also had to force reboot it every time this happens.

We’ll need way more context to help. Start a new thread, state your hardware, and share at minimum journalctl --boot -1 and dmesg (in code blocks) after rebooting after one of these freezes. It’d be extra helpful if you could pin down a specific nixpkgs commit, which you could do by bisecting.

But please start a different thread, this one’s ancient and almost certainly unrelated.

A bit late but seems related to Kernel panic - how to retrieve logs?

Issue still occurs with 6.11 and any Nvidia drivers above 53x