Resumes immediately after suspend: How to diagnose

mirrorwitch · October 24, 2023, 8:42am

Help, my NixOS has insomnia!

Here’s the problem I’m having:

System wakes up immediately after systemctl suspend.
System wakes up unpredictably after systemctl suspend -i. Sometimes it bounces right back; sometimes it will suspend for a few seconds or minutes, then come back when I’m not looking.
I have already disabled every single device on /proc/acpi/wakeup, it didn’t help.
I’ve upgraded to Linux 6.5.7, it didn’t help. I’m on the unstable channel, but I started on stable and had the same issue.
I have a Debian stable + Gnome on the same system and it seems to sleep without issues.
I’m running Wayland/sway if that matters.

My system is a Thinkpad T14s Gen 3 AMD. Here’s a dmesg for a systemctl suspend -i that resumed immediately:

[Di, 24. Okt 2023, 10:36:03] PM: suspend entry (s2idle)
[Di, 24. Okt 2023, 10:36:03] Filesystems sync: 0.001 seconds
[Di, 24. Okt 2023, 10:36:03] Freezing user space processes
[Di, 24. Okt 2023, 10:36:03] Freezing user space processes completed (elapsed 0.001 seconds)
[Di, 24. Okt 2023, 10:36:03] OOM killer disabled.
[Di, 24. Okt 2023, 10:36:03] Freezing remaining freezable tasks
[Di, 24. Okt 2023, 10:36:03] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
[Di, 24. Okt 2023, 10:36:03] printk: Suspending console(s) (use no_console_suspend to debug)
[Di, 24. Okt 2023, 10:36:03] ACPI: EC: interrupt blocked
[Di, 24. Okt 2023, 10:36:04] ACPI: EC: interrupt unblocked
[Di, 24. Okt 2023, 10:36:04] [drm] PCIE GART of 1024M enabled (table at 0x000000F43FC00000).
[Di, 24. Okt 2023, 10:36:04] amdgpu 0000:33:00.0: amdgpu: SMU is resuming…
[Di, 24. Okt 2023, 10:36:04] amdgpu 0000:33:00.0: amdgpu: SMU is resumed successfully!
[Di, 24. Okt 2023, 10:36:05] nvme nvme0: 12/0/0 default/read/poll queues
[Di, 24. Okt 2023, 10:36:05] nvme nvme0: Ignoring bogus Namespace Identifiers
[Di, 24. Okt 2023, 10:36:05] [drm] VCN decode and encode initialized successfully(under DPG Mode).
[Di, 24. Okt 2023, 10:36:05] [drm] JPEG decode initialized successfully.
[Di, 24. Okt 2023, 10:36:05] amdgpu 0000:33:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[Di, 24. Okt 2023, 10:36:05] amdgpu 0000:33:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[Di, 24. Okt 2023, 10:36:05] amdgpu 0000:33:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[Di, 24. Okt 2023, 10:36:05] amdgpu 0000:33:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[Di, 24. Okt 2023, 10:36:05] amdgpu 0000:33:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[Di, 24. Okt 2023, 10:36:05] amdgpu 0000:33:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[Di, 24. Okt 2023, 10:36:05] amdgpu 0000:33:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[Di, 24. Okt 2023, 10:36:05] amdgpu 0000:33:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[Di, 24. Okt 2023, 10:36:05] amdgpu 0000:33:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
[Di, 24. Okt 2023, 10:36:05] amdgpu 0000:33:00.0: amdgpu: ring kiq_0.2.1.0 uses VM inv eng 11 on hub 0
[Di, 24. Okt 2023, 10:36:05] amdgpu 0000:33:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[Di, 24. Okt 2023, 10:36:05] amdgpu 0000:33:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 8
[Di, 24. Okt 2023, 10:36:05] amdgpu 0000:33:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 8
[Di, 24. Okt 2023, 10:36:05] amdgpu 0000:33:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 8
[Di, 24. Okt 2023, 10:36:05] amdgpu 0000:33:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 8
[Di, 24. Okt 2023, 10:36:05] OOM killer enabled.
[Di, 24. Okt 2023, 10:36:05] Restarting tasks … done.
[Di, 24. Okt 2023, 10:36:05] random: crng reseeded on system resumption
[Di, 24. Okt 2023, 10:36:05] PM: suspend exit

–

the GPU is:

33:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Rembrandt [Radeon 680M] [1002:1681] (rev d2)

mirrorwitch · October 25, 2023, 8:45am

Some more information:

this machine does not support S3, only S2idle/S0. there’s no option in the BIOS, and the manufacturer says the CPU doesn’t even support S3. Linux seem to detect that correctly; /sys/power/mem_sleep lists [s2idle] and nothing else.
~~Sometimes suspend will not only resume the machine immediately, but also power off immediately after resuming~~ edit: this was due to me trying to use the “power” button to resume.
Moving the lid slightly or brushing the touchpad will trigger wakeup. But as mentioned before, even disabling the ACPI wakeups don’t prevent the system from immediately waking up.
I tried updating firmwares with a vendor update disk, and with fwupdmgr. It didn’t help.

hibernation to swap partition works! However, upon resuming, the wiki is borked. Link comes down and any attempt to do any operation results in slow hangups that idle for several seconds with errors like:

[ 1222.633241] ath11k_pci 0000:01:00.0: failed to enable PMF QOS: (-11
[ 1234.601245] ath11k_pci 0000:01:00.0: wmi command 16387 timeout
[ 1234.601255] ath11k_pci 0000:01:00.0: failed to send WMI_PDEV_SET_PARAM cmd
[ 1234.601262] ath11k_pci 0000:01:00.0: failed to enable PMF QOS: (-11

rmmod’ing ath11k_pci then ath11k then modprobing them back fixes it. sadly, “rmmodding ath11k_pci” is one of the operations that hangs for several seconds before success, so if you add that to hibernate’s normal slowless, it’s quite a hassle compared to a suspend.

mirrorwitch · October 25, 2023, 9:28am

ok so ath11k_pci breaking on hibernate gave me a clue; it seems to be a hardware/firmware issue. other threads about it:

https://bugzilla.kernel.org/show_bug.cgi?id=217239

I have no idea how this was just working on Debian. maybe it wasn’t and I just didn’t test it enough. the culprit is probably the T14S Gen 3’s wifi card:

0000:01:00.0 Network controller [0280]: Qualcomm Technologies, Inc QCNFA765 Wireless Network Adapter [17cb:1103] (rev 01)

workarounds suggested in the threads didn’t work for me, but a combination of removing the module and disabling the ACPI wakeups did (if you don’t do the latter, it will resume on stuff like closing down the lid). here’s a beautiful (horrible) script that seems to work:

#!/bin/sh
for dev in $(grep enabled /proc/acpi/wakeup|cut -f 1); do
    echo -n "disabling acpi wakeup: "
    echo $dev | sudo tee /proc/acpi/wakeup
done

set -x
sudo systemctl stop NetworkManager
sudo rmmod ath11k_pci
sudo rmmod ath11k
echo freeze | sudo tee /sys/power/state
$HOME/.local/bin/lock &  # a sway lock script
sleep 1  # avoid a modprobe fail (maybe)
sudo modprobe ath11k
sudo modprobe ath11k_pci
sudo systemctl restart NetworkManager

You can’t use systemctl suspend up there because it’s asynchronous. Script works for hibernate too, just pass “disk” to /sys/power/state rather than “freeze” (=s2idle).

TLATER · October 25, 2023, 3:27pm

Since the title is rather generic, I imagine someone will eventually arrive here not with a specific firmware bug but trying to find out how to do everything that led up to identifying the firmware issue.

Firstly, check if nixos-hardware has a module for your laptop/motherboard. If it does, it likely already has a fix (it has a suggestion even for you @mirrorwitch, though your other comment suggests that that is a copy/paste mistake from whoever copied the t14 config over).

If there is no module, or it doesn’t fix it, you’ll have to dig a bit further. Usually this works:

So, to diagnose this issue in general, you’ll want to read that file. It’ll give output like this:

Device  S-state   Status   Sysfs node
GP12      S4    *enabled   pci:0000:00:07.1
GP13      S4    *enabled   pci:0000:00:08.1
XHC0      S4    *enabled   pci:0000:09:00.3
GP30      S4    *disabled
GP31      S4    *disabled
PS2K      S3    *disabled
GPP0      S4    *disabled  pci:0000:00:01.1
GPP8      S4    *disabled  pci:0000:00:03.1
PTXH      S4    *enabled   pci:0000:02:00.0
PT20      S4    *disabled
PT21      S4    *disabled
PT22      S4    *disabled
PT23      S4    *disabled
PT24      S4    *enabled   pci:0000:03:04.0
PT26      S4    *disabled
PT27      S4    *disabled
PT28      S4    *enabled   pci:0000:03:08.0
PT29      S4    *enabled   pci:0000:03:09.0

This specifies which devices may wake up your system. They can be disabled using (substituting <DEVICE> with the name of the device from the first column):

echo <DEVICE> | sudo tee /proc/acpi/wakeup

Try that in sequence until you hit the device that causes the system to immediately wake up, and then add a udev rule to disable wakeups from that device:

  services.udev.extraRules = concatStringsSep ", " [
    ''ACTION=="add"''

    # See below on how to get the correct values for these three
    ''SUBSYSTEM=="pci"''
    ''ATTR{vendor}=="0x1022"'' 
    ''ATTR{device}=="0x1483"''

    ''ATTR{power/wakeup}="disabled"''
  ];

udev identifies devices by attributes. To get device attributes for a device from /proc/acpi/wakeup, take note of the sysfs node. If it’s a pci device, you can pretty easily get a list of all its attributes by using:

udevadm info -a /sys/bus/pci/devices/<numbers+colons after "pci:">

That’ll look something like this:

  looking at device '/devices/pci0000:00/0000:00:01.1':
    KERNEL=="0000:00:01.1"
    SUBSYSTEM=="pci"
    DRIVER=="pcieport"
    ATTR{ari_enabled}=="0"
    ATTR{broken_parity_status}=="0"
    ATTR{class}=="0x060400"
    ATTR{consistent_dma_mask_bits}=="32"
    ATTR{device}=="0x1483"
    ATTR{vendor}=="0x1022"

Any of those attributes can in theory be used for identification (and they’re useful for writing more complex rules), but in this case device and vendor is probably what you want.

If that still doesn’t help, there’s no clear formula. It’ll take sleuthing like what @mirrorwitch did, and likely means some kind of firmware issue is causing the problem.

colemickens · October 26, 2023, 11:12pm

If a system exhibit this behavior, but seems to sleep fine when the lid is closed, what does this mean?

Does this imply that there is some sort of firmware/bios support that is triggered beneath Linux to cause sleep that systemd is not able to trigger?

After closing my laptop and not having it wake-up in my bag, etc (as happened with old laptops + linux), I was surprised to see the behavior described by the OP happen with `systemctl suspend. Sometimes, it sleeps, sometimes it immediately wakes-up, sometimes it sleeps for 3sec or so and then wakes.

TLATER · October 27, 2023, 1:00am

Chances are the key press for the command command instantly triggers wake up. Try sleep 1 && systemctl suspend.

peterhoeg · October 27, 2023, 8:22am

I know you’re using a laptop, but I have a similar reproducible problem on a desktop.

I basically have a cheap USB pcie adapter bought from aliexpress, which will wake up the machine at a random time if anything is plugged into it regardless of any “do not wake up on any activity on this device” settings I have applied.

I know it’s hard to rip bits out of a laptop, but maybe try disconnecting everything and see if you can reproduce it?

h-the · March 11, 2024, 10:16am

I have identified which device on my Thinkpad T15 prevents suspend. (XHC)

Device  S-state   Status   Sysfs node
GLAN      S4    *enabled   pci:0000:00:1f.6
XHC       S3    *enabled   pci:0000:00:14.0      <<<<<
XDCI      S4    *disabled
HDAS      S4    *disabled  pci:0000:00:1f.3
RP01      S4    *enabled   pci:0000:00:1c.0
PXSX      S4    *disabled  pci:0000:02:00.0
.
.
.

looking at device '/devices/pci0000:00/0000:00:14.0':

    ATTR{device}=="0x02ed"
    ATTR{vendor}=="0x8086"

Now, I have created a udev rule based on the template.

  services.udev.extraRules = '' concatStringsSep ", "
      ACTION=="add"
      SUBSYSTEM=="pci"
      ATTR{vendor}=="0x8086"
      ATTR{device}=="0x02ed"
      ATTR{power/wakeup}="disabled"
  '';

after reboot:

Device  S-state   Status   Sysfs node
GLAN      S4    *disabled  pci:0000:00:1f.6
XHC       S3    *disabled  pci:0000:00:14.0
XDCI      S4    *disabled
HDAS      S4    *disabled  pci:0000:00:1f.3
RP01      S4    *disabled  pci:0000:00:1c.0
PXSX      S4    *disabled  pci:0000:02:00.0
.
.
.

The suspend mode is working, but all devices are disabled from waking up from sleep, not just the XHC. For me, that’s okay because the notebook can be activated using the power button. However, I would still like to understand why this is the case.

mirrorwitch · May 19, 2024, 1:10pm

addendum for people who are looking for info on this specific ath11k device: I’ve since added this to the wakeup script:

sudo iw dev wlp1s0 set power_save off

to account for a different bug which makes the device always come back from suspend in power save mode, which also makes your download speed abysmally low (curiously, it doesn’t affect upload speed for me, so instead of my uplink’s 100/50 Mbps I’d get 13/50 Mbps).

you can check if this bug affects you with iw dev wlp1s0 get power_save (substitute the name of your interface if needed).