Nixos 21.05 crashed 3 times within 2 days

schlichtanders · August 31, 2021, 12:30pm

Hi all,

maybe you can help me inspecting my instable system. Since yesterday my nixos crashed 3 times.

From my perspective the timepoints were completely random.
In Addition the crash is immediate. The system restarts directly, without any waiting and is back after kind of 30 seconds in total. Of course everything is gone after such a restart.

Which logs can I inspect to understand better what is going on?
Any other hints and ideas?

using nixos 21.05.2693.d5aadbefd65 (Okapi)
with kde

raboof · August 31, 2021, 12:32pm

Maybe journalctl -b -1, which should give you the logs for the previous boot? (--list-boots to look for older logs)

schlichtanders · August 31, 2021, 12:41pm

thank you very much, I found the critical line:

Aug 31 14:20:54 gram17 kernel: thermal thermal_zone3: acpitz: critical temperature reached, shutting down

I wasn’t hearing any fan…

Does someone knows any tools to inspect the precise temperatures and fan activities?

pvonmoradi · August 31, 2021, 3:50pm

https://wiki.archlinux.org/title/lm_sensors#Using_sensor_data

About fan control drivers: it depends on your machine. Usually it shouldn’t come to manually changing when fan should start or turn off.

schlichtanders · September 15, 2021, 9:23am

I found another thread where this crash is also reported by others for the Laptop LG Gram 17.

https://bbs.archlinux.org/viewtopic.php?id=268721

Unfortunately no solution yet, however it seems to have to do with RAM and sleep / hibernation mode.

peterhoeg · September 15, 2021, 9:51am

I have a laptop on unstable that also recently has started not being able to survive being suspended half the time. Changed the RAM as I thought that was the problem - no dice.

peterhoeg · September 23, 2021, 1:29am

So far I haven’t had any crashes after doing this:

boot.kernelParams = [
  # https://bbs.archlinux.org/viewtopic.php?pid=1902231#p1902231
  "i915.enable_psr=0"
];

bgibson · September 23, 2021, 3:43am

There are at least two monitoring utilities in nixpkgs that can display your CPU temps, bpytop and gotop.

There are several others that can show fan speed, but I haven’t used any.

Oh and it might be worth checking if you have powerManagement.cpuFreqGovernor set to performance, and change it to ondemand or powersave instead. eg, powerManagement.cpuFreqGovernor = lib.mkDefault "powersave";, either in configuration.nix or hardware-configuration.nix.

nrdxp · September 23, 2021, 5:36pm

I actually ran into this issue on my wifes laptop the other day as well. The stange thing is, I was monitoring the temps with btm and it never actually crossed the threshold as far as I could tell. I was able to manually disable the trigger at runtime by modifying an option in /sys to the thermal device. Can’t remember the exact option name atm. I’ll see if I can update this post later after I get a chance to review my shell history on her laptop.

schlichtanders · September 26, 2021, 4:25pm

unfortunately this didn’t work for me. I am happy it worked for you
Looking further

schlichtanders · September 28, 2021, 11:53am

The thread on ArchLinux reported that the issue is understood, reported and a patch already in work

https://lore.kernel.org/linux-pm/202109 … nel.org/T/

Antoine Tenart wrote:

What happens is this drivers uses a global variable to keep track of the tcc offset (tcc_offset_save) and uses it on resume. The issue is this variable is initialized to 0, but is only set in tcc_offset_degree_celsius_store, i.e. when the tcc offset is explicitly set by userspace. If that does not happen, the resume path will set the offset to 0 (in my case the h/w default being 3, the offset would become too low after a suspend/resume cycle).

however, for now no workaround yet

schlichtanders · September 28, 2021, 12:36pm

Actually someone reported switching to linux kernel 4.19 solves the problem.

I tried setting

boot.kernelPackages = pkgs.linuxPackages_4_19;

However setting this kernel, on reboot the system does not start my desktop anylonger (Plasma KDE) but stays in a plain terminal.

schlichtanders · October 20, 2021, 11:55am

The problem is fixed now in the latest linux kernel.

I am very happy that I can use sleep-mode again