Suspend / Resume is (partially) broken on my Desktop. The screen stays black but network comes up, so I can SSH into the machine and get a bit of diagnostics out. I have a GTX 1060 in this machine, so I configured it to use the proprietary nvidia driver as per recommendation. I am also running KDE Plasma 6 on this machine with Xorg server, no wayland. The config for this system is here:
Looking at journalctl seems to indicate issues with the graphics card:
Dec 02 18:46:28 cube kernel: PM: suspend exit
Dec 02 18:46:28 cube kernel: NVRM: GPU at PCI:0000:07:00: GPU-b75ba18c-1365-8656-6e98-8e243012b937
Dec 02 18:46:28 cube kernel: NVRM: Xid (PCI:0000:07:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: Shader Program Header 18 Error
Dec 02 18:46:28 cube kernel: NVRM: Xid (PCI:0000:07:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: ESR 0x405840=0x82040000
Dec 02 18:46:28 cube kernel: NVRM: Xid (PCI:0000:07:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: ESR 0x405848=0x80000000
Dec 02 18:46:28 cube kernel: NVRM: Xid (PCI:0000:07:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: ChID 0040, Class 0000c197, Offset 00001b0c, Data 1000f010
[...]
Dec 02 18:46:31 cube kernel: NVRM: Xid (PCI:0000:07:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: Shader Program Header 11 Error
Dec 02 18:46:31 cube kernel: NVRM: Xid (PCI:0000:07:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: Shader Program Header 18 Error
Dec 02 18:46:31 cube kernel: NVRM: Xid (PCI:0000:07:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: ESR 0x405840=0xa2040800
Dec 02 18:46:31 cube kernel: NVRM: Xid (PCI:0000:07:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: ESR 0x405848=0x80000000
Dec 02 18:46:31 cube kernel: NVRM: Xid (PCI:0000:07:00): 13, pid='<unknown>', name=<unknown>, Graphics Exception: ChID 009c, Class 0000c197, Offset 00002390, Data 00000000
[...]
Dec 02 18:46:44 cube kwin_x11[6983]: kwin_scene_opengl: A graphics reset attributable to the current GL context occurred.
Dec 02 18:46:44 cube kernel: BUG: unable to handle page fault for address: ffffc90015eea800
Dec 02 18:46:44 cube kernel: #PF: supervisor read access in kernel mode
Dec 02 18:46:44 cube kernel: #PF: error_code(0x0000) - not-present page
Dec 02 18:46:44 cube kernel: PGD 100000067 P4D 100000067 PUD 10026e067 PMD 19b24d067 PTE 0
Dec 02 18:46:44 cube kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI
Dec 02 18:46:44 cube kernel: CPU: 3 PID: 8518 Comm: vsync event mon Tainted: P O 6.6.63 #1-NixOS
Dec 02 18:46:44 cube kernel: Hardware name: Gigabyte Technology Co., Ltd. B450 AORUS M/B450 AORUS M, BIOS F32 05/06/2019
Dec 02 18:46:44 cube kernel: RIP: 0010:_nv012663rm+0xf1/0x1f0 [nvidia]
Dec 02 18:46:44 cube kernel: Code: 00 48 8b b8 a0 1d 00 00 49 8b 84 24 20 09 00 00 48 c7 45 28 00 00 00 00 89 45 20 e8 f9 0b 66 00 4c 8b 45 18 8b 4d 20 49 8b 00 <8b> 10 48 89 48 20 48 8b 4d 28 48 89 48 28 0f ae f8 89 50 18 8b 45
Dec 02 18:46:44 cube kernel: RSP: 0018:ffffc90011b6bbd0 EFLAGS: 00010246
Dec 02 18:46:44 cube kernel: RAX: ffffc90015eea800 RBX: ffff888180d79408 RCX: 0000000000000be0
Dec 02 18:46:44 cube kernel: RDX: 00000000180d6d4b RSI: 0000000000009410 RDI: ffff888114918008
Dec 02 18:46:44 cube kernel: RBP: ffff888100cbdbb0 R08: ffff8881981458d8 R09: 0000000000000000
Dec 02 18:46:44 cube kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff888127566008
Dec 02 18:46:44 cube kernel: R13: 000000000007f800 R14: 0000000000000003 R15: ffff888127566900
Dec 02 18:46:44 cube kernel: FS: 00007f127a7dc6c0(0000) GS:ffff8887fe380000(0000) knlGS:0000000000000000
Dec 02 18:46:44 cube kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 02 18:46:44 cube kernel: CR2: ffffc90015eea800 CR3: 0000000163ba4000 CR4: 00000000003506e0
Dec 02 18:46:44 cube kernel: Call Trace:
Dec 02 18:46:44 cube kernel: <TASK>
Dec 02 18:46:44 cube kernel: ? __die+0x23/0x80
Dec 02 18:46:44 cube kernel: ? page_fault_oops+0x171/0x500
Dec 02 18:46:44 cube kernel: ? srso_return_thunk+0x5/0x5f
Dec 02 18:46:44 cube kernel: ? srso_return_thunk+0x5/0x5f
Dec 02 18:46:44 cube kernel: ? search_bpf_extables+0x5f/0x90
Dec 02 18:46:44 cube kernel: ? exc_page_fault+0x158/0x160
Dec 02 18:46:44 cube kernel: ? asm_exc_page_fault+0x26/0x30
Dec 02 18:46:44 cube kernel: ? _nv012663rm+0xf1/0x1f0 [nvidia]
Dec 02 18:46:44 cube kernel: _nv023448rm+0x97/0xa6 [nvidia]
Dec 02 18:46:44 cube kernel: _nv047909rm+0x1a1/0x1b0 [nvidia]
Dec 02 18:46:44 cube kernel: _nv022972rm+0xd9/0x160 [nvidia]
Dec 02 18:46:44 cube kernel: _nv049933rm+0x3ff/0x500 [nvidia]
Dec 02 18:46:44 cube kernel: _nv014741rm+0x3f1/0x690 [nvidia]
Dec 02 18:46:44 cube kernel: _nv048059rm+0x69/0xd0 [nvidia]
Dec 02 18:46:44 cube kernel: ? _nv000702kms+0x90/0x90 [nvidia_modeset]
Dec 02 18:46:44 cube kernel: _nv013137rm+0x86/0xa0 [nvidia]
Dec 02 18:46:44 cube kernel: _nv000598rm+0x5e/0x70 [nvidia]
Dec 02 18:46:44 cube kernel: rm_kernel_rmapi_op+0x127/0x213 [nvidia]
Dec 02 18:46:44 cube kernel: ? srso_return_thunk+0x5/0x5f
Dec 02 18:46:44 cube kernel: nvkms_call_rm+0x4f/0x90 [nvidia_modeset]
Dec 02 18:46:44 cube kernel: _nv002849kms+0x42/0x50 [nvidia_modeset]
Dec 02 18:46:44 cube kernel: ? _nv002490kms+0x75/0xa0 [nvidia_modeset]
Dec 02 18:46:44 cube kernel: ? _nv000119kms+0x67/0xa0 [nvidia_modeset]
Dec 02 18:46:44 cube kernel: ? srso_return_thunk+0x5/0x5f
Dec 02 18:46:44 cube kernel: ? _copy_from_user+0x2f/0x90
Dec 02 18:46:44 cube kernel: ? srso_return_thunk+0x5/0x5f
Dec 02 18:46:44 cube kernel: ? nvKmsIoctl+0xf9/0x270 [nvidia_modeset]
Dec 02 18:46:44 cube kernel: ? nvkms_unlocked_ioctl+0x11a/0x190 [nvidia_modeset]
Dec 02 18:46:44 cube kernel: ? __x64_sys_ioctl+0x9f/0xe0
Dec 02 18:46:44 cube kernel: ? do_syscall_64+0x39/0x90
Dec 02 18:46:44 cube kernel: ? entry_SYSCALL_64_after_hwframe+0x78/0xe2
Dec 02 18:46:44 cube kernel: </TASK>
Dec 02 18:46:44 cube kernel: CR2: ffffc90015eea800
Dec 02 18:46:44 cube kernel: ---[ end trace 0000000000000000 ]---
Dec 02 18:46:44 cube kernel: RIP: 0010:_nv012663rm+0xf1/0x1f0 [nvidia]
Dec 02 18:46:44 cube kernel: Code: 00 48 8b b8 a0 1d 00 00 49 8b 84 24 20 09 00 00 48 c7 45 28 00 00 00 00 89 45 20 e8 f9 0b 66 00 4c 8b 45 18 8b 4d 20 49 8b 00 <8b> 10 48 89 48 20 48 8b 4d 28 48 89 48 28 0f ae f8 89 50 18 8b 45
Dec 02 18:46:44 cube kernel: RSP: 0018:ffffc90011b6bbd0 EFLAGS: 00010246
Dec 02 18:46:44 cube kernel: RAX: ffffc90015eea800 RBX: ffff888180d79408 RCX: 0000000000000be0
Dec 02 18:46:44 cube kernel: RDX: 00000000180d6d4b RSI: 0000000000009410 RDI: ffff888114918008
Dec 02 18:46:44 cube kernel: RBP: ffff888100cbdbb0 R08: ffff8881981458d8 R09: 0000000000000000
Dec 02 18:46:44 cube kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff888127566008
Dec 02 18:46:44 cube kernel: R13: 000000000007f800 R14: 0000000000000003 R15: ffff888127566900
Dec 02 18:46:44 cube kernel: FS: 00007f127a7dc6c0(0000) GS:ffff8887fe380000(0000) knlGS:0000000000000000
Dec 02 18:46:44 cube kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 02 18:46:44 cube kernel: CR2: ffffc90015eea800 CR3: 0000000163ba4000 CR4: 00000000003506e0
Dec 02 18:46:44 cube kernel: note: vsync event mon[8518] exited with irqs disabled
Some searching around the error messages brought me here:
https://wiki.archlinux.org/title/NVIDIA/Tips_and_tricks#Preserve_video_memory_after_suspend
https://download.nvidia.com/XFree86/Linux-x86_64/435.17/README/powermanagement.html
And it seems as if, for some reason, it’s required to use the nvidia drivers /proc
interface to tell it that a suspend is imminent as well as tell it post-resume that this is what has happened.
It doesn’t seem as if NixOS currently creates nvidia-suspend.sh
, nvidia-resume.sh
, and nvidia-hibernate.sh
and just adding these Kernel options of the first link will completely disable suspend/resume and the driver complains that it didn’t receive the suspend call on the proc-interface.
Does anyone else encounter this issue after upgrading to 24.11?