I have been having an issue where my laptop (NixOS 23.11) has been randomly restarting and freezing. I have not been able to find the cause of the restarts though systemd logs. The freezing seems to entirely lock the system. This has been difficult to diagnose since it appears to be independent of my config and DE. It is also seemingly random. I have been unable to find the root cause of the issue. It has not happened while gaming and mainly happens when using my web browser or neovim (I dont know why or how these are correlated. it is just something I noticed). Here is a list of everything I have tried.
Original system:
hyprland
running Nvidia with the open kernel module in prime sync mode
Memtester: 200 cycles with no issue
hw-probe: no critical failures. (I lost the link to the results)
checked dmesg to see if I had this: Ryzen - ArchWiki (section 4.1) error.
reinstalled NixOS using the gnome NixOS 23.11 image.
same problems occur with me making no changes (aside from my home directory being intact and a couple nix-shells).
unable to run benchmarks due to apparently (idk wth is going on here) broken shared libraries.
Current system:
gnome
nouveau driver
I’ve been working on this for a while and am running out of ideas so any help is appreciated.
Hardware: ASUS ROG G15 (2021)
AMD Ryzen 9 5900hs
16 GB of ram
RTX 3070 mobile
1 TB ssd storage
Did it start happening after some particular action? like updating from 23.05 to 23.11 ?
can you find an older generation that does not have the problem?
something interesting is happening. I managed to reproduce the restart issue and was about to post the logs, then my system rebooted again and is stuck in systemd’s emergency mode.
A day of troubleshooting later and a massive detour figuring out how to fix btrfs (and making a backup). I now have a log for you. Dec 22 12:15:47 nixos logs before crashing - Pastebin.com
bios is most likely not up to date (that might be my next approach. I figured it was unlikely for it to be the cause but you never know)
In the second log, there is an error related to a disk of yours:
Error mounting /dev/sda2 at /run/media/dragonblade316/lfs: wrong fs type, bad option, bad superblock on /dev/sda2, missing codepage or helper program, or other error
Additionally, there are a ton of errors pertaining thumbnails? I’d investigate that. Perhaps some corrupted on-diks state/cache. Try to reproduce this with an empty home directory (i.e. log in as root, mv your home dir to a temporary name, log in).
This smells a bit like disk corruption?
Also, what exactly is this “freeze”? Can you still switch TTYs?
Could you repro the issue and then press the magic sysrq + s a couple times before rebooting and show the log again?
The lfs drive is an ntfs partition on an external hard drive (a remnant of me switching from windows) that has been broken since long before this started happening (I’ve been too lazy to deal with it).
Currently I am on an empty home directory (decided to wipe the computer redo the partitions just in case) though the problem still occurs.
The freeze is comprehensive, I am unable to switch to a tty. Even rebooting takes longer (its not just a tap on the power button).
I have also found a way to more or less reproduce it. it is (most the time) fine with my terminal* but does a coin flip between freezing and restarting after around 2 mins of playing a video on yt in firefox (Though it seemed to be with brave as well).
here is the new log you needed.
output of journalctl -b -1 -x after hitting sysrq + s logs/freezelog2.txt
I might try downgrading the kernel and see if the issue disappears. I’ve upgraded my system a couple
time in the last month or two and I’m wondering if this is being caused by a kernel update.
Wayland issues : Dec 22 12:19:38 nixos .gnome-shell-wr[1426]: Xwayland terminated, exiting since it was mandatory
Multimedia issues, bluetooth, and alot related to wayland, keybindings and this one:
Dec 22 12:19:38 nixos .gnome-shell-wr[2129]: Unable to mount volume lfs: Gio.IOErrorEnum: Error mounting /dev/sda2 at /run/media/dragonblade316/lfs: wrong fs type, bad option, bad superblock on /dev/sda2, missing codepage or helper program, or other error
What fs is that? NTFS?
Dec 22 12:19:38 nixos .gsd-media-keys[2389]: Failed to grab accelerator for keybinding settings:hibernate
Dec 22 12:19:38 nixos .gsd-media-keys[2389]: Failed to grab accelerator for keybinding settings:playback-repeat
Problematic keybindings?
Dec 22 12:19:38 nixos pipewire[2633]: mod.jackdbus-detect: Failed to receive jackdbus reply: org.freedesktop.DBus.Error.ServiceUnknown: The name org.jackaudio.service was not provided by any .service files
Audio setup issues?
Dec 22 12:19:38 nixos gnome-shell[2661]: nvc0_screen_create:999 - Base screen init failed: -19
Dec 22 12:19:38 nixos gnome-shell[2661]: libEGL warning: egl: failed to create dri2 screen
Dec 22 12:19:38 nixos gnome-shell[2661]: nvc0_screen_create:999 - Base screen init failed: -19
Some of this might be useful if I take more time to troubleshoot. That Xwayland one espesially.
However, I am now beginning to think this is a hardware issue. I attempted to install windows on the laptop, and it is not even making past the file preparation stage before freezing. It is not exhibiting the random reboots that linux was, but it is still not looking good.
Eventually, after the system was broken for a while, installing windows on it magically (I have no other way to describe this) worked and the problems disappeared. Though in its current life as a nix server, it appears to be exhibiting the behavior again though with less frequency. The ironic thing is that the framework you are having issues with is the one I bought to replace the freezing laptop so that is fun.
I am becoming worried it is an issue with mobile AMD cpus in some part of Linux. GPU and RAM were the prior culprits but neither makes sense at this point since running memory tests revealed no issues and while the driver is enabled the GPU is not being used by any process on my machine (and nvidia gpus are not an option with framework).
IDK, if it is an issue with the kernel (which is my current suspicion) I would not know where to begin with debugging other than trying to hook up some form of debugger. Might try other kernels and see if they fix the issue.
Welp, after a bit of time on Windows which somehow fixed the issue and after running this thing as a nix server for a while, the issue is back. Time to get back to the debugging cycle.