Laptop shutting down to overheat, only in Linux

The ASUS Linux person (folks?) think that this is a NixOS issue. I find that hard to believe, but also, I don’t know enough about this stack to know what the issue is.

My laptop can’t play even a single game of Halo without supposedly overheating. I get this in the logs before it shuts down:

Jan 16 21:24:28 zeph systemd-logind[3364]: System is powering down.
Jan 16 21:24:28 zeph systemd-logind[3364]: The system will power off now!
Jan 16 21:24:28 zeph kernel: amdgpu 0000:03:00.0: amdgpu: ERROR: System is going to shutdown due to GPU SW CTF!
Jan 16 21:24:28 zeph kernel: amdgpu 0000:03:00.0: amdgpu: ERROR: GPU over temperature range(SW CTF) detected!

Just now, when this happened again, I had something logging the output of sensors (from lm_sensors) to files.

This is the contents of the last one before shutdown:

{
   "asus-isa-0000":{
      "Adapter": "ISA adapter",
      "cpu_fan":{
         "fan1_input": 5100.000
      },
      "gpu_fan":{
         "fan2_input": 5700.000
      }
   },
   "BAT0-acpi-0":{
      "Adapter": "ACPI interface",
      "in0":{
         "in0_input": 15.933
      },
      "power1":{
         "power1_input": 0.000
      }
   },
   "amdgpu-pci-0700":{
      "Adapter": "PCI adapter",
      "vddgfx":{
         "in0_input": 0.914
      },
      "vddnb":{
         "in1_input": 1.040
      },
      "edge":{
         "temp1_input": 83.000
      },
      "PPT":{
         "power1_input": 2.104
      }
   },
   "nvme-pci-0600":{
      "Adapter": "PCI adapter",
      "Composite":{
         "temp1_input": 46.850,
         "temp1_max": 83.850,
         "temp1_min": -5.150,
         "temp1_crit": 87.850,
         "temp1_alarm": 0.000
      }
   },
   "k10temp-pci-00c3":{
      "Adapter": "PCI adapter",
      "Tctl":{
         "temp1_input": 96.250
      }
   },
   "acpitz-acpi-0":{
      "Adapter": "ACPI interface",
      "temp1":{
         "temp1_input": 96.000
      },
      "temp2":{
         "temp2_input": 20.000
      }
   },
   "iwlwifi_1-virtual-0":{
      "Adapter": "Virtual device",
      "temp1":{
         "temp1_input": 56.000
      }
   },
   "amdgpu-pci-0300":{
      "Adapter": "PCI adapter",
      "vddgfx":{
         "in0_input": 0.725
      },
      "fan1":{
         "fan1_input": 0.000,
         "fan1_min": 0.000,
         "fan1_max": 3300.000
      },
      "edge":{
         "temp1_input": 95.000,
         "temp1_crit": 100.000,
         "temp1_crit_hyst": -273.150,
         "temp1_emergency": 105.000
      },
      "junction":{
         "temp2_input": 97.000,
         "temp2_crit": 100.000,
         "temp2_crit_hyst": -273.150,
         "temp2_emergency": 105.000
      },
      "mem":{
         "temp3_input": 78.000,
         "temp3_crit": 100.000,
         "temp3_crit_hyst": -273.150,
         "temp3_emergency": 105.000
      },
      "PPT":{
         "power1_average": 35.000,
         "power1_cap": 80.000
      }
   }
}

Does anyone have any idea what’s going on? I swear this laptop can play Halo in Windows, get much hotter, and not shut down.

I’m at my wit’s end. I’m prepared to buy new hardware… but I gamed on this for over a year without ever having a single issue. And I’m terrified if I upgrade, I’m just going to hit the same underlying issue.

I’m curious:

  • any ideas?
  • how else can I track down what’s going on?

It doesn’t even seem like it actually hit the crit threshold? I can’t figure out what SW CTF means. I’m exasperate and desperate for any advice. I just want to play a game and unwind and instead, well, it’s the opposite of relaxing.

Is this a PC? Have you considered cleaning it, replacing thermal paste, perhaps replacing whatever aging fan is in there? Same applies to laptops to be fair.

I wouldn’t be too surprised if it shuts down after sustaining 97° for a while, even if that’s not marked as the critical temperature. You’re within 3° of that, a short spike can well send it over the edge and trigger before lm_sensors reports it, and being that close to 100° long-term probably should trigger some protection mechanism.

Maybe windows just works it less hard. If you want to satisfy their Linux person for a warranty claim, perhaps reproduce it on a ubuntu system or such.

1 Like