System freezes (only during big updates)

Operating System: NixOS 20.09.4407.1c1f5649bb9
KDE Plasma Version: 5.18.5
KDE Frameworks Version: 5.73.0
Qt Version: 5.15.2
Kernel Version: 5.4.122
OS Type: 64-bit
Processors: 4 × Intel® Core™ i7-3520M CPU @ 2.90GHz
Memory: 11.5 GiB of RAM

i3 version 4.18.3 (2020-10-20)
TP T430
Motherboard was installed as never previously used (still packaged)
BIOS almost a couple years old

In the last months:

  • I experienced a few (five or so) system freezes
  • always happening exclusively during a bigger update
    • not sure during which stage of the update
      • generally, after reboot and trying a new update, there are still packages to be installed
  • when in plasma, plasma+i3, i3
    • with or without open Firefox, with or without hardware acceleration, etc
  • memory and cpu usage are average (last freeze happened with half of memory in use and around 50% cpu usage)

An extract of journactl is available as gist .

So far I did:

  • a couple checks with memtest (results seem all fine)
  • some basic research
  • add nixos-hardware
    • after the last freeze
      • I previously had some of nixos-hardware relevant setup scattered in my configuration
  • found some throttles, but do not seem to be related with any of the freezes
    • I would not swear about this

Probably I should point out that

  • so far I experienced this behavior only with NixOS (same hardware)
  • still on 20.09

There is a way to know exactly in which stage of an update a problem eventually arises?

Any suggestion will be very welcome.

During such updates, can you use htop/top and watch cat /proc/cpuinfo (or whatever the correct path was) to monitor CPU usage and frequency?

Perhaps even lmsensors as well to monitor temperature.

1 Like

Thank you for the suggestion. I added some usage data to the post. Initially I actually thought that could be about memory or cpu usage, a browse (Firefox, looking at you) or whatever issue (including touchpad, not totally discarded option so far), but seems not to be the case. I still have to find something concrete.

In fact, I am now used to stop everything I am doing (especially YouTube videos, etc) and stare at some monitor (going to add iotop) during updates. Basically, to update is not a pleasant experience anymore…

I start to think that could be something related to kde/plasma, even if not the same case, as a few times it started to complain that could not recognize (I do not remember exactly now) the display, custom shortcuts, xclip and some other things stopped to work all together. Generally a log out/ log in solved the issue.

Also, in my case it seems to be related with 20.09, as I started with 20.03 and never had issues before changing release.

As you said it also happens in “pure” i3, I doubt plasma is to blame.

Though your CPU is a bit older than mine (i5-4210M CPU @ 2.60GHz), and I do recognize occasional slowdowns as well.

Usually those are accompanied by increased CPU usage and high core temperature. Those happened a lot less since I cleaned the cooling related parts (which I do about once a year), especially the radiator is always blocked by dust.

This is why I asked you to monitor actual clock rate and temperature. When overheating my CPU clocks down to 800MHz and wont go over 1GHz before a certain threshold is reached again.

And of course when in this overheating mode, several freezes occure, in worst case up to the point where the BIOS just kills everything as the heat spreads throughout the system.

I guess this is a long shot, but this happened to me yesterday and it turned out I was low on disk space.

Thank you for the hints.

To clarify, I said even if not the same case: with i3 I am pretty sure that I experienced one of the first freezes (as they happened months apart, I am not sure of the sequence). Not blaming, but pointing out that the xclip, shortcuts etc is exclusively a plasma thing. Perhaps that has nothing to do with the issue.

That should lead to a shutdown, right? Or could just freeze the system with the need of a hard reboot?

Thank you for the suggestion. I do not think that could be my case, as it is reported as max 40% occupied.

Did you solve your issue?

Edit: found as lm_sensors

I forgot to say that I do not find it as package… Should be installed in some other way?

Yeah, that would lead to a shutdown, though I experienced minute long freezes, during which I wasn’t even able to connect via SSH.

Usually I do a hardware power down after 5 to 10 minutes. Though during remotely updating I got dropped out from SSH, and my son reported that my laptop was on full fans for an hour before it went out.

So, have you already monitored temperature and actual click rate?

Package is lm_sensors.

Once I left the laptop on as test for several hours (meaning, around 12 hours) and it continued completely frozen.

I will try to check if I can connect with SSH, if it will freeze again.

Can you expand on it, please? I have no idea about it :slight_smile:

It did. Sorry that yours is harder!

1 Like
❯ sensors                                                                                                                          3⚹ 2↺ 1? fde052b 
coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +50.0°C  (high = +87.0°C, crit = +105.0°C)
Core 0:        +50.0°C  (high = +87.0°C, crit = +105.0°C)
Core 1:        +50.0°C  (high = +87.0°C, crit = +105.0°C)

BAT0-acpi-0
Adapter: ACPI interface
in0:          11.51 V  

acpitz-acpi-0
Adapter: ACPI interface
temp1:        +48.0°C  (crit = +200.0°C)

thinkpad-isa-0000
Adapter: ISA adapter
fan1:        3768 RPM
temp1:        +48.0°C  
temp2:         +0.0°C  
temp3:         +0.0°C  
temp4:         +0.0°C  
temp5:         +0.0°C  
temp6:         +0.0°C  
temp7:         +0.0°C  
temp8:         +0.0°C  

is this right…? :thinking:

Glad to know that you solved it :smiley:

Idle my system is on about the same temperature.

The interesting thing is how it goes under load.

Though when it’s time to clean up again then idle temperature is usually already at 60°C and more.

Damn autocorrect… I meant clock rate of the CPU.

/proc/cpuinfo reflects it in one of the fields.

I have a tab running watch cat /proc/cpuinfo , but I do not see any clock rate/frequency field…

s-tui right now reports

image

cpupower frequency-info
current CPU frequency: 1.79 GHz (asserted by call to kernel)

cpu MHz in /proc/cpuinfo is current clock rate. Though watching it during regular usage that does not cause a freeze is rather pointless, you have to monitor/watch it shortly before freezes occur.

Anyway, if the freeze doesn’t resolve even after hours, it is likely that the system is stuck another way.

I’m wondering why it tells you the frequency of 4 cores… According to the Intel datasheet you only have 2 cores with 4 threads.

To be honest though, it is a CPU released 10 years ago and not produced anymore. If you are able to, upgrade… I am trying to replace my laptop as well.

After some sleep I have been able to find it, thank you :sweat_smile:

I am organizing a layout with some monitors, next update I will be ready!

It looks completely and permanently dead, stuck on frozen screens displaying what was on screen at the moment of the freeze. When happens, only option is hard reboot (that I know about so far).

any preference about it?