Suspend/Resume broken after 24.11 Update

Suspend / Resume is (partially) broken on my Desktop. The screen stays black but network comes up, so I can SSH into the machine and get a bit of diagnostics out. I have a GTX 1060 in this machine, so I configured it to use the proprietary nvidia driver as per recommendation. I am also running KDE Plasma 6 on this machine with Xorg server, no wayland. The config for this system is here:

Looking at journalctl seems to indicate issues with the graphics card:

Dec 02 18:46:28 cube kernel: PM: suspend exit
Dec 02 18:46:28 cube kernel: NVRM: GPU at PCI:0000:07:00: GPU-b75ba18c-1365-8656-6e98-8e243012b937
Dec 02 18:46:28 cube kernel: NVRM: Xid (PCI:0000:07:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: Shader Program Header 18 Error
Dec 02 18:46:28 cube kernel: NVRM: Xid (PCI:0000:07:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: ESR 0x405840=0x82040000
Dec 02 18:46:28 cube kernel: NVRM: Xid (PCI:0000:07:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: ESR 0x405848=0x80000000
Dec 02 18:46:28 cube kernel: NVRM: Xid (PCI:0000:07:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: ChID 0040, Class 0000c197, Offset 00001b0c, Data 1000f010

[...]

Dec 02 18:46:31 cube kernel: NVRM: Xid (PCI:0000:07:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: Shader Program Header 11 Error
Dec 02 18:46:31 cube kernel: NVRM: Xid (PCI:0000:07:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: Shader Program Header 18 Error
Dec 02 18:46:31 cube kernel: NVRM: Xid (PCI:0000:07:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: ESR 0x405840=0xa2040800
Dec 02 18:46:31 cube kernel: NVRM: Xid (PCI:0000:07:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: ESR 0x405848=0x80000000
Dec 02 18:46:31 cube kernel: NVRM: Xid (PCI:0000:07:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: ChID 009c, Class 0000c197, Offset 00002390, Data 00000000

[...]

Dec 02 18:46:44 cube kwin_x11[6983]: kwin_scene_opengl: A graphics reset attributable to the current GL context occurred.
Dec 02 18:46:44 cube kernel: BUG: unable to handle page fault for address: ffffc90015eea800
Dec 02 18:46:44 cube kernel: #PF: supervisor read access in kernel mode
Dec 02 18:46:44 cube kernel: #PF: error_code(0x0000) - not-present page
Dec 02 18:46:44 cube kernel: PGD 100000067 P4D 100000067 PUD 10026e067 PMD 19b24d067 PTE 0
Dec 02 18:46:44 cube kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI
Dec 02 18:46:44 cube kernel: CPU: 3 PID: 8518 Comm: vsync event mon Tainted: P           O       6.6.63 #1-NixOS
Dec 02 18:46:44 cube kernel: Hardware name: Gigabyte Technology Co., Ltd. B450 AORUS M/B450 AORUS M, BIOS F32 05/06/2019 
Dec 02 18:46:44 cube kernel: RIP: 0010:_nv012663rm+0xf1/0x1f0 [nvidia]
Dec 02 18:46:44 cube kernel: Code: 00 48 8b b8 a0 1d 00 00 49 8b 84 24 20 09 00 00 48 c7 45 28 00 00 00 00 89 45 20 e8 f9 0b 66 00 4c 8b 45 18 8b 4d 20 49 8b 00 <8b> 10 48 89 48 20 48 8b 4d 28 48 89 48 28 0f ae f8 89 50 18 8b 45
Dec 02 18:46:44 cube kernel: RSP: 0018:ffffc90011b6bbd0 EFLAGS: 00010246
Dec 02 18:46:44 cube kernel: RAX: ffffc90015eea800 RBX: ffff888180d79408 RCX: 0000000000000be0
Dec 02 18:46:44 cube kernel: RDX: 00000000180d6d4b RSI: 0000000000009410 RDI: ffff888114918008
Dec 02 18:46:44 cube kernel: RBP: ffff888100cbdbb0 R08: ffff8881981458d8 R09: 0000000000000000 
Dec 02 18:46:44 cube kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff888127566008
Dec 02 18:46:44 cube kernel: R13: 000000000007f800 R14: 0000000000000003 R15: ffff888127566900
Dec 02 18:46:44 cube kernel: FS:  00007f127a7dc6c0(0000) GS:ffff8887fe380000(0000) knlGS:0000000000000000
Dec 02 18:46:44 cube kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 02 18:46:44 cube kernel: CR2: ffffc90015eea800 CR3: 0000000163ba4000 CR4: 00000000003506e0 
Dec 02 18:46:44 cube kernel: Call Trace:
Dec 02 18:46:44 cube kernel:  <TASK>
Dec 02 18:46:44 cube kernel:  ? __die+0x23/0x80
Dec 02 18:46:44 cube kernel:  ? page_fault_oops+0x171/0x500
Dec 02 18:46:44 cube kernel:  ? srso_return_thunk+0x5/0x5f
Dec 02 18:46:44 cube kernel:  ? srso_return_thunk+0x5/0x5f
Dec 02 18:46:44 cube kernel:  ? search_bpf_extables+0x5f/0x90
Dec 02 18:46:44 cube kernel:  ? exc_page_fault+0x158/0x160
Dec 02 18:46:44 cube kernel:  ? asm_exc_page_fault+0x26/0x30
Dec 02 18:46:44 cube kernel:  ? _nv012663rm+0xf1/0x1f0 [nvidia]
Dec 02 18:46:44 cube kernel:  _nv023448rm+0x97/0xa6 [nvidia]
Dec 02 18:46:44 cube kernel:  _nv047909rm+0x1a1/0x1b0 [nvidia]
Dec 02 18:46:44 cube kernel:  _nv022972rm+0xd9/0x160 [nvidia]
Dec 02 18:46:44 cube kernel:  _nv049933rm+0x3ff/0x500 [nvidia]
Dec 02 18:46:44 cube kernel:  _nv014741rm+0x3f1/0x690 [nvidia]
Dec 02 18:46:44 cube kernel:  _nv048059rm+0x69/0xd0 [nvidia]
Dec 02 18:46:44 cube kernel:  ? _nv000702kms+0x90/0x90 [nvidia_modeset]
Dec 02 18:46:44 cube kernel:  _nv013137rm+0x86/0xa0 [nvidia]
Dec 02 18:46:44 cube kernel:  _nv000598rm+0x5e/0x70 [nvidia]
Dec 02 18:46:44 cube kernel:  rm_kernel_rmapi_op+0x127/0x213 [nvidia]
Dec 02 18:46:44 cube kernel:  ? srso_return_thunk+0x5/0x5f
Dec 02 18:46:44 cube kernel:  nvkms_call_rm+0x4f/0x90 [nvidia_modeset]
Dec 02 18:46:44 cube kernel:  _nv002849kms+0x42/0x50 [nvidia_modeset]
Dec 02 18:46:44 cube kernel:  ? _nv002490kms+0x75/0xa0 [nvidia_modeset]
Dec 02 18:46:44 cube kernel:  ? _nv000119kms+0x67/0xa0 [nvidia_modeset]
Dec 02 18:46:44 cube kernel:  ? srso_return_thunk+0x5/0x5f
Dec 02 18:46:44 cube kernel:  ? _copy_from_user+0x2f/0x90
Dec 02 18:46:44 cube kernel:  ? srso_return_thunk+0x5/0x5f
Dec 02 18:46:44 cube kernel:  ? nvKmsIoctl+0xf9/0x270 [nvidia_modeset]
Dec 02 18:46:44 cube kernel:  ? nvkms_unlocked_ioctl+0x11a/0x190 [nvidia_modeset]
Dec 02 18:46:44 cube kernel:  ? __x64_sys_ioctl+0x9f/0xe0
Dec 02 18:46:44 cube kernel:  ? do_syscall_64+0x39/0x90
Dec 02 18:46:44 cube kernel:  ? entry_SYSCALL_64_after_hwframe+0x78/0xe2
Dec 02 18:46:44 cube kernel:  </TASK>

Dec 02 18:46:44 cube kernel: CR2: ffffc90015eea800
Dec 02 18:46:44 cube kernel: ---[ end trace 0000000000000000 ]---
Dec 02 18:46:44 cube kernel: RIP: 0010:_nv012663rm+0xf1/0x1f0 [nvidia]
Dec 02 18:46:44 cube kernel: Code: 00 48 8b b8 a0 1d 00 00 49 8b 84 24 20 09 00 00 48 c7 45 28 00 00 00 00 89 45 20 e8 f9 0b 66 00 4c 8b 45 18 8b 4d 20 49 8b 00 <8b> 10 48 89 48 20 48 8b 4d 28 48 89 48 28 0f ae f8 89 50 18 8b 45
Dec 02 18:46:44 cube kernel: RSP: 0018:ffffc90011b6bbd0 EFLAGS: 00010246
Dec 02 18:46:44 cube kernel: RAX: ffffc90015eea800 RBX: ffff888180d79408 RCX: 0000000000000be0
Dec 02 18:46:44 cube kernel: RDX: 00000000180d6d4b RSI: 0000000000009410 RDI: ffff888114918008
Dec 02 18:46:44 cube kernel: RBP: ffff888100cbdbb0 R08: ffff8881981458d8 R09: 0000000000000000
Dec 02 18:46:44 cube kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff888127566008
Dec 02 18:46:44 cube kernel: R13: 000000000007f800 R14: 0000000000000003 R15: ffff888127566900
Dec 02 18:46:44 cube kernel: FS:  00007f127a7dc6c0(0000) GS:ffff8887fe380000(0000) knlGS:0000000000000000
Dec 02 18:46:44 cube kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 02 18:46:44 cube kernel: CR2: ffffc90015eea800 CR3: 0000000163ba4000 CR4: 00000000003506e0
Dec 02 18:46:44 cube kernel: note: vsync event mon[8518] exited with irqs disabled

Some searching around the error messages brought me here:

https://wiki.archlinux.org/title/NVIDIA/Tips_and_tricks#Preserve_video_memory_after_suspend
https://download.nvidia.com/XFree86/Linux-x86_64/435.17/README/powermanagement.html

And it seems as if, for some reason, it’s required to use the nvidia drivers /proc interface to tell it that a suspend is imminent as well as tell it post-resume that this is what has happened.

It doesn’t seem as if NixOS currently creates nvidia-suspend.sh, nvidia-resume.sh, and nvidia-hibernate.sh and just adding these Kernel options of the first link will completely disable suspend/resume and the driver complains that it didn’t receive the suspend call on the proc-interface.

Does anyone else encounter this issue after upgrading to 24.11?

2 Likes

I have nvidia optimus and enable nvidia drivers.

My case is a bit different in that I try to systemctl suspend and the OS freezes, and I can only force powering the laptop off…

It was working just fine some versions prior.

I’ve found hardware.nvidia.powerManagement.enable: NixOS Search

With it enabed, my screen now turns on after resume. However, it just shows console output, the desktop UI never comes back

1 Like

Still not sure what is going on, but it definitely is not just a driver issue. I changed the nvidia driver version to be the one I used in 24.05 (550.78), however the issue still persists. Something else must have gotten messed up in 24.11.

1 Like

I have the same problem, also with a GTX 1060. I’m currently running both linuxPackages_latest and the beta-branch NVIDIA drivers, and for now that seems to work, but I’ll keep an eye on that.

1 Like

Updating the driver to beta fixes the issue somewhat. My system comes back in the graphical environment eventually. However, in 24.05, it maybe took 5 seconds for the lockscreen to appear and afterwards working was fluent. Now, it takes around 30 seconds for the lockscreen to appear and another 1-2 minutes afterwards until the desktop becomes responsive.

I’m still on linuxPackages.linux_6_6 due to ZFS. All more recent Kernels for stable ZFS are already EoL.

This seems related, but unfortunately the thread went stale without solution: Screen locker: Must switch to virtual console and back to get password dialog - #12 by Lehas777 - Help - KDE Discuss

1 Like

I ran into the same issues, but managed to make suspend somewhat work for my use case. I use i3 as a window manager, and picom as a compositor. I’m currently on driver version 565.77. What works for me is enabling hardware.nvidia.powerManagement and killing picom automatically after suspend. It is not perfect as I get hangs for a couple of seconds when opening a new alacritty terminal for the first time. The lock screen appears within a couple of seconds.

Some other variations I have tried:

  • Not enabling hardware.nvidia.powerManagement, requires me to switch to another tty and back to get a login screen (and my wallpaper disappears).
  • Not killing picom automatically after suspend, the login screen just never appears, unless I go to another tty and kill picom from there.
  • Killing picom before suspend. This results in just a black screen (but there is a video signal being sent, as my monitor stays awake), and switching tty’s doesn’t work.

It doesn’t seem to only be suspend/resume. I noticed I get these issues even when the screen just turns off after some minutes of idling. It appears as if 24.11 has some issues, whether they come from NixOS itself or just a bad constellation of software versions. There seems to be a plenty of breakage around of stuff that used to work before.

Facing the same issue with unstable channel without Nvidia drivers.

1 Like

Some updates later, desktop does not come back even after extensive waiting. Need to switch to console and to restart the display manager.

1 Like

Latest update, things started working for me again with Beta Nvidia drivers. Strangely enough, I have to disable hardware.nvidia.powerManagement.enable again.