Suspend and unsuspend issues

One of my NixOS systems started having frequent unsuspend issues (a month or two ago) and then suspend issues (a couple of weeks ago). Here’s the information I could gather so far that seems relevant.

System info

  • I have an old Nvidia
01:00.0 VGA compatible controller: NVIDIA Corporation GK208B [GeForce GT 730] (rev a1)
  • I’m using the proprietary nvidia drivers

Config 1: suspend working, unsuspend sometimes freezes

nixpkgs https://github.com/NixOS/nixpkgs/commits/6143fc5eeb9c4f00163267708e26191d1e918932
nvidia-settings:  version 470.239.06
systemd 255 (255.4)
+PAM +AUDIT -SELINUX +APPARMOR +IMA +SMACK +SECCOMP +GCRYPT -GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 +PWQUALITY +P11KIT +QRENCODE +TPM2 +BZIP2 +LZ4 +XZ +ZLIB +ZSTD +BPF_FRAMEWORK -XKBCOMMON +UTMP -SYSVINIT default-hierarchy=unified

Suspend always works as expected, i.e. powers off the rest of the system
Unsupend always powers on the system
Unsupend sometimes (but not always) leaves the system frozen (e.g. no display, keyboard not working)

Last few lines of journalctl -b -1

Aug 06 21:46:53 hawk systemd[1]: Reached target Sleep.
Aug 06 21:46:53 hawk systemd[1]: Starting System Suspend...
Aug 06 21:46:53 hawk systemd-sleep[6282]: Performing sleep operation 'suspend'...
Aug 06 21:46:53 hawk kernel: PM: suspend entry (deep)

Config 2: suspend always freezes

nixpkgs https://github.com/NixOS/nixpkgs/commits/9f918d616c5321ad374ae6cb5ea89c9e04bf3e58
nvidia-settings:  version 470.256.02
systemd 256 (256.2)
+PAM +AUDIT -SELINUX +APPARMOR +IMA +SMACK +SECCOMP +GCRYPT -GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBCRYPTSETUP_PLUGINS +LIBFDISK +PCRE2 +PWQUALITY +P11KIT +QRENCODE +TPM2 +BZIP2 +LZ4 +XZ +ZLIB +ZSTD +BPF_FRAMEWORK -XKBCOMMON +UTMP -SYSVINIT +LIBARCHIVE

After suspend, system stays fully powered on
Keyboard is on
SysRq seems to be always available (caps lock flashing)
I’ve confirmed I’ve enabled all sysrq

# echo 1 > /proc/sys/kernel/sysrq 
$ cat /proc/sys/kernel/sysrq 
1

but I was not able to succeed in invoking any of the SysRq functions

Last few lines of journalctl -b -1

Aug 06 21:52:50 hawk systemd[1]: Reached target Sleep.
Aug 06 21:52:50 hawk systemd[1]: Starting System Suspend...
Aug 06 21:52:50 hawk systemd-sleep[5253]: Successfully froze unit 'user.slice'.
Aug 06 21:52:50 hawk systemd-sleep[5253]: Performing sleep operation 'suspend'...
Aug 06 21:52:50 hawk kernel: PM: suspend entry (deep)
Aug 06 21:52:50 hawk kernel: Filesystems sync: 0.025 seconds

Additional notes

Note that I’ve tested this by doing nixos-rebuild switch to config 1, suspend, was lucky to unsuspend successfully, swtich to config 2 and cause freeze on suspend, meaning the kernel stayed the same, so likely not the culprit:

Linux hawk 6.6.28 #1-NixOS SMP PREEMPT_DYNAMIC Wed Apr 17 09:19:38 UTC 2024 x86_64 GNU/Linux

My default sleep mode is deep:

$ cat /sys/power/mem_sleep
s2idle [deep]

Changing that to s2idle

# echo s2idle > /sys/power/mem_sleep
$ cat /sys/power/mem_sleep
[s2idle] deep

did not seem to help

I feel this has something to do with it:

However, I don’t see those services

$ systemctl|egrep -c 'systemd-(homed|suspend)'
0

Any ideas what I can do to troubleshoot further?

Unless you use systemd-homed (the fancy encrypted user home directories), that issue is probably not directly related. nvidia isn’t a bad suspect, though, their driver is notorious for causing problems with suspend.

Given that you’re not getting any logs or any other useful output, I’d suggest two things:

  1. Assert this is actually caused by nvidia by testing against nouveau.
  2. Bisect to get an exact commit for when this broke
    • Remember you can do a binary search, so the number of commits to test is only log(n). You can filter for commits that changed nvidia driver, kernel versions and such to bring that number even further down.