Black screen when resuming from suspend

On NixOS unstable, I can’t properly resume from suspend: The screen stays black. Looking at the kernel logs (below) I notice two things

kernel: amdgpu 0000:09:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring gfx test failed (-110)

and

kernel: BUG: kernel NULL pointer dereference, address: 0000000000000008

Maybe there is something else I’m not noticing.

Any idea how I could try to diagnose and fix this?

kernel: ACPI: Low-level resume complete
kernel: PM: Restoring platform NVS memory
kernel: LVT offset 0 assigned for vector 0x400
kernel: Enabling non-boot CPUs ...
kernel: x86: Booting SMP configuration:
kernel: smpboot: Booting Node 0 Processor 1 APIC 0x2
kernel: microcode: CPU1: patch_level=0x08701013
kernel: CPU1 is up
kernel: smpboot: Booting Node 0 Processor 2 APIC 0x4
kernel: microcode: CPU2: patch_level=0x08701013
kernel: CPU2 is up
kernel: smpboot: Booting Node 0 Processor 3 APIC 0x8
kernel: microcode: CPU3: patch_level=0x08701013
kernel: CPU3 is up
kernel: smpboot: Booting Node 0 Processor 4 APIC 0xa
kernel: microcode: CPU4: patch_level=0x08701013
kernel: CPU4 is up
kernel: smpboot: Booting Node 0 Processor 5 APIC 0xc
kernel: microcode: CPU5: patch_level=0x08701013
kernel: CPU5 is up
kernel: smpboot: Booting Node 0 Processor 6 APIC 0x10
kernel: microcode: CPU6: patch_level=0x08701013
kernel: CPU6 is up
kernel: smpboot: Booting Node 0 Processor 7 APIC 0x12
kernel: microcode: CPU7: patch_level=0x08701013
kernel: CPU7 is up
kernel: smpboot: Booting Node 0 Processor 8 APIC 0x14
kernel: microcode: CPU8: patch_level=0x08701013
kernel: CPU8 is up
kernel: smpboot: Booting Node 0 Processor 9 APIC 0x18
kernel: microcode: CPU9: patch_level=0x08701013
kernel: CPU9 is up
kernel: smpboot: Booting Node 0 Processor 10 APIC 0x1a
kernel: microcode: CPU10: patch_level=0x08701013
kernel: CPU10 is up
kernel: smpboot: Booting Node 0 Processor 11 APIC 0x1c
kernel: microcode: CPU11: patch_level=0x08701013
kernel: CPU11 is up
kernel: smpboot: Booting Node 0 Processor 12 APIC 0x1
kernel: microcode: CPU12: patch_level=0x08701013
kernel: CPU12 is up
kernel: smpboot: Booting Node 0 Processor 13 APIC 0x3
kernel: microcode: CPU13: patch_level=0x08701013
kernel: CPU13 is up
kernel: smpboot: Booting Node 0 Processor 14 APIC 0x5
kernel: microcode: CPU14: patch_level=0x08701013
kernel: CPU14 is up
kernel: smpboot: Booting Node 0 Processor 15 APIC 0x9
kernel: microcode: CPU15: patch_level=0x08701013
kernel: CPU15 is up
kernel: smpboot: Booting Node 0 Processor 16 APIC 0xb
kernel: microcode: CPU16: patch_level=0x08701013
kernel: CPU16 is up
kernel: smpboot: Booting Node 0 Processor 17 APIC 0xd
kernel: microcode: CPU17: patch_level=0x08701013
kernel: CPU17 is up
kernel: smpboot: Booting Node 0 Processor 18 APIC 0x11
kernel: microcode: CPU18: patch_level=0x08701013
kernel: CPU18 is up
kernel: smpboot: Booting Node 0 Processor 19 APIC 0x13
kernel: microcode: CPU19: patch_level=0x08701013
kernel: CPU19 is up
kernel: smpboot: Booting Node 0 Processor 20 APIC 0x15
kernel: microcode: CPU20: patch_level=0x08701013
kernel: CPU20 is up
kernel: smpboot: Booting Node 0 Processor 21 APIC 0x19
kernel: microcode: CPU21: patch_level=0x08701013
kernel: CPU21 is up
kernel: smpboot: Booting Node 0 Processor 22 APIC 0x1b
kernel: microcode: CPU22: patch_level=0x08701013
kernel: CPU22 is up
kernel: smpboot: Booting Node 0 Processor 23 APIC 0x1d
kernel: microcode: CPU23: patch_level=0x08701013
kernel: CPU23 is up
kernel: ACPI: Waking up from system sleep state S3
kernel: sd 3:0:0:0: [sda] Starting disk
kernel: nvme nvme0: Shutdown timeout set to 10 seconds
kernel: nvme nvme0: 8/0/0 default/read/poll queues
kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
kernel: usb 1-3.3: reset full-speed USB device number 6 using xhci_hcd
kernel: ata9: SATA link down (SStatus 0 SControl 300)
kernel: ata11: SATA link down (SStatus 0 SControl 300)
kernel: ata5: SATA link down (SStatus 0 SControl 300)
kernel: ata12: SATA link down (SStatus 0 SControl 300)
kernel: ata10: SATA link down (SStatus 0 SControl 300)
kernel: ata3: SATA link down (SStatus 0 SControl 300)
kernel: ata6: SATA link down (SStatus 0 SControl 300)
kernel: usb 1-3.4: reset full-speed USB device number 8 using xhci_hcd
kernel: amdgpu 0000:09:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring gfx test failed (-110)
kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <gfx_v8_0> failed -110
kernel: [drm:amdgpu_device_resume [amdgpu]] *ERROR* amdgpu_device_ip_resume failed (-110).
kernel: PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 returns -110
kernel: PM: Device 0000:09:00.0 failed to resume async: error -110
kernel: ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
kernel: ata4.00: supports DRM functions and may not be fully accessible
kernel: ata4.00: supports DRM functions and may not be fully accessible
kernel: ata4.00: configured for UDMA/133
kernel: ata4.00: Enabling discard_zeroes_data
kernel: OOM killer enabled.
kernel: Restarting tasks ... done.
kernel: thermal thermal_zone0: failed to read out thermal zone (-61)
kernel: PM: suspend exit
kernel: iwlwifi 0000:05:00.0: Applying debug destination EXTERNAL_DRAM
kernel: iwlwifi 0000:05:00.0: FW already configured (0) - re-configuring
kernel: Move buffer fallback to memcpy unavailable
kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the buffer list -19!
kernel: 8021q: adding VLAN 0 to HW filter on device enp4s0
kernel: iwlwifi 0000:05:00.0: Applying debug destination EXTERNAL_DRAM
kernel: iwlwifi 0000:05:00.0: FW already configured (0) - re-configuring
kernel: wlan0: authenticate with e2:63:da:3e:63:2e
kernel: wlan0: send auth to e2:63:da:3e:63:2e (try 1/3)
kernel: wlan0: authenticated
kernel: wlan0: associate with e2:63:da:3e:63:2e (try 1/3)
kernel: wlan0: RX AssocResp from e2:63:da:3e:63:2e (capab=0x411 status=0 aid=2)
kernel: wlan0: associated
kernel: IPv6: ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
kernel: amdgpu: [powerplay] Trying to freeze SCLK DPM when DPM is disabled
kernel: amdgpu: [powerplay]
kernel: amdgpu: [powerplay]
kernel: amdgpu: [powerplay] Trying to Unfreeze SCLK DPM when DPM is disabled
kernel: amdgpu: [powerplay]
kernel: amdgpu: [powerplay]
kernel: amdgpu: [powerplay]
kernel: amdgpu: [powerplay]
kernel: BUG: kernel NULL pointer dereference, address: 0000000000000008
kernel: #PF: supervisor read access in kernel mode
kernel: #PF: error_code(0x0000) - not-present page
kernel: PGD 1f72f47067 P4D 1f72f47067 PUD 1f72a89067 PMD 0
kernel: Oops: 0000 [#1] SMP NOPTI
kernel: CPU: 2 PID: 2157 Comm: X Not tainted 5.4.80 #1-NixOS
kernel: Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS ELITE/X570 AORUS ELITE, BIOS F11 12/06/2019
kernel: RIP: 0010:build_audio_output.isra.0+0x97/0x110 [amdgpu]
kernel: Code: 64 89 43 24 8b 95 20 01 00 00 89 53 18 89 53 1c 48 8b 45 08 8b 80 14 03 00 00 83 f8 04 74 4f 83 e8 20 83 e0 df 75 13 48 8b 3f <48> 8b 47 08 48 8b 40 08 e8 7c 8e f2 f8 89 43 2c 8b 85 88 02 00 00
kernel: RSP: 0018:ffffa01541a87558 EFLAGS: 00010246
kernel: RAX: 0000000000000000 RBX: ffffa01541a875ac RCX: ffff934c6e99b000
kernel: RDX: 0000000000515e14 RSI: 0000000000879ec0 RDI: 0000000000000000
kernel: RBP: ffff934c3a6801b8 R08: 000000000003a980 R09: ffffa01541a87470
kernel: R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
kernel: R13: ffff934c6e99b000 R14: 0000000000000000 R15: ffff934d62c80000
kernel: FS:  00007f780f032980(0000) GS:ffff934d9e880000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 0000000000000008 CR3: 0000001f7329c000 CR4: 0000000000340ee0
kernel: Call Trace:
kernel:  dce110_apply_ctx_to_hw+0x284/0x5a0 [amdgpu]
kernel:  ? pp_dpm_dispatch_tasks+0x45/0x60 [amdgpu]
kernel:  ? dm_pp_apply_display_requirements+0x1ab/0x1d0 [amdgpu]
kernel:  dc_commit_state+0x280/0x5e0 [amdgpu]
kernel:  amdgpu_dm_atomic_commit_tail+0xd3b/0x1d90 [amdgpu]
kernel:  ? bw_calcs+0xa26/0x39e0 [amdgpu]
kernel:  ? amdgpu_bo_pin_restricted+0x65/0x280 [amdgpu]
kernel:  ? dm_plane_helper_prepare_fb+0x221/0x280 [amdgpu]
kernel:  ? _cond_resched+0x15/0x30
kernel:  ? wait_for_completion_timeout+0x36/0x130
kernel:  ? _cond_resched+0x15/0x30
kernel:  ? wait_for_completion_interruptible+0x33/0x170
kernel:  ? commit_tail+0x94/0x110 [drm_kms_helper]
kernel:  commit_tail+0x94/0x110 [drm_kms_helper]
kernel:  drm_atomic_helper_commit+0x108/0x110 [drm_kms_helper]
kernel:  drm_client_modeset_commit_atomic+0x1d0/0x1f0 [drm]
kernel:  drm_client_modeset_commit_force+0x50/0x150 [drm]
kernel:  drm_fb_helper_restore_fbdev_mode_unlocked+0x49/0xa0 [drm_kms_helper]
kernel:  drm_fb_helper_set_par+0x2c/0x50 [drm_kms_helper]
kernel:  fb_set_var+0x183/0x350
kernel:  ? update_load_avg+0x78/0x660
kernel:  ? update_curr+0x69/0x1a0
kernel:  ? __update_load_avg_se+0x23b/0x320
kernel:  fbcon_blank+0x20d/0x270
kernel:  do_unblank_screen+0xaa/0x150
kernel:  complete_change_console+0x54/0xd0
kernel:  vt_ioctl+0x124e/0x1290
kernel:  ? drm_ioctl+0x1f4/0x370 [drm]
kernel:  ? drm_setmaster_ioctl+0xb0/0xb0 [drm]
kernel:  tty_ioctl+0x372/0x8c0
kernel:  ? ptep_set_access_flags+0x29/0x40
kernel:  ? do_wp_page+0x170/0x530
kernel:  ? selinux_file_ioctl+0x174/0x220
kernel:  do_vfs_ioctl+0x3fe/0x660
kernel:  ksys_ioctl+0x5e/0x90
kernel:  __x64_sys_ioctl+0x16/0x20
kernel:  do_syscall_64+0x4e/0x120
kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
kernel: RIP: 0033:0x7f780f434cd7
kernel: Code: c0 75 b5 48 8d 3c 2b e8 17 ff ff ff 85 c0 78 b6 48 89 d8 5b 5d 41 5c c3 66 2e 0f 1f 84 00 00 00 00 00 90 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 69 c1 0c 00 f7 d8 64 89 01 48
kernel: RSP: 002b:00007ffd4ed3d1c8 EFLAGS: 00003246 ORIG_RAX: 0000000000000010
kernel: RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f780f434cd7
kernel: RDX: 0000000000000001 RSI: 0000000000005605 RDI: 000000000000000e
kernel: RBP: 00000000006402b0 R08: 0000000000000000 R09: 0000000000000000
kernel: R10: 000000000000003c R11: 0000000000003246 R12: 0000000000640370
kernel: R13: 000000000062f004 R14: 00000000006402ac R15: 00007ffd4ed3d254
kernel: Modules linked in: xt_mark ctr xt_MASQUERADE xt_comment af_packet ccm algif_aead des_generic libdes algif_skcipher cmac md4 algif_hash 8021q af_alg msr ip6table_nat iptable_nat nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 amdgpu nf_defrag_ipv4 ip6t_rpfilter ipt_rpfilter ip6table_raw iptable_raw xt_pkttype nf_log_ipv6 nf_log_ipv4 nf_log_common amd_iommu_v2 xt_LOG gpu_sched xt_tcpudp ttm snd_hda_codec_realtek ip6table_filter ip6_tables snd_hda_codec_generic wmi_bmof sch_fq_codel iptable_filter ledtrig_audio drm_kms_helper snd_hda_codec_hdmi iwlmvm snd_pcm_oss snd_hda_intel uvcvideo edac_mce_amd snd_mixer_oss snd_intel_nhlt edac_core videobuf2_vmalloc mac80211 drm snd_usb_audio snd_hda_codec atkbd videobuf2_memops libps2 videobuf2_v4l2 snd_usbmidi_lib snd_hda_core serio nls_iso8859_1 videobuf2_common deflate snd_rawmidi nls_cp437 agpgart efi_pstore snd_hwdep snd_seq_device vfat fb_sys_fops fat libarc4 snd_pcm videodev igb pstore syscopyarea snd_timer sp5100_tco sysfillrect snd
kernel:  crct10dif_pclmul joydev mousedev evdev watchdog sysimgblt crc32_pclmul iwlwifi ghash_clmulni_intel mc backlight soundcore k10temp mac_hid efivars i2c_piix4 ptp cfg80211 pps_core dca i2c_algo_bit rfkill i2c_core wmi thermal pinctrl_amd button loop acpi_cpufreq cpufreq_powersave tun tap macvlan bridge stp llc kvm irqbypass efivarfs ip_tables x_tables autofs4 dm_crypt input_leds led_class hid_generic usbhid hid sd_mod xhci_pci xhci_hcd ahci libahci libata usbcore aesni_intel scsi_mod crypto_simd nvme cryptd glue_helper nvme_core usb_common rtc_cmos dm_mod btrfs libcrc32c crc32c_generic crc32c_intel xor zstd_decompress zstd_compress raid6_pq
kernel: CR2: 0000000000000008
kernel: ---[ end trace 012e2ece76b8945a ]---
kernel: RIP: 0010:build_audio_output.isra.0+0x97/0x110 [amdgpu]
kernel: Code: 64 89 43 24 8b 95 20 01 00 00 89 53 18 89 53 1c 48 8b 45 08 8b 80 14 03 00 00 83 f8 04 74 4f 83 e8 20 83 e0 df 75 13 48 8b 3f <48> 8b 47 08 48 8b 40 08 e8 7c 8e f2 f8 89 43 2c 8b 85 88 02 00 00
kernel: RSP: 0018:ffffa01541a87558 EFLAGS: 00010246
kernel: RAX: 0000000000000000 RBX: ffffa01541a875ac RCX: ffff934c6e99b000
kernel: RDX: 0000000000515e14 RSI: 0000000000879ec0 RDI: 0000000000000000
kernel: RBP: ffff934c3a6801b8 R08: 000000000003a980 R09: ffffa01541a87470
kernel: R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
kernel: R13: ffff934c6e99b000 R14: 0000000000000000 R15: ffff934d62c80000
kernel: FS:  00007f780f032980(0000) GS:ffff934d9e880000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 0000000000000008 CR3: 0000001f7329c000 CR4: 0000000000340ee0

I had the same issue, but with a different error message. So the cause might be different.

https://github.com/NixOS/nixpkgs/issues/170429

I found out, when you report an issue to AMDs bugtracker from NixOS, they will ignore it. But when you can reproduce it on an officially supported OS, like RHEL, Ubuntu, SLED / SLES and contact the support, a developer provides a patch in 2 days!

See [amdgpu] kernel crash when trying to resume from suspend under memory pressure (#2223) · Issues · drm / amd · GitLab

You could test the patch from there:

  # add boot option with kernel patch
  specialisation."amdgpu-patch-2223" = {
    inheritParentConfig = true;
    configuration = {
      boot.loader.grub.configurationName = "amdgpu-patch-2223";
      #boot.kernelPackages = pkgs.linuxPackages_6_0;
      boot.kernelPatches = [
        { name = "amdgpu-patch";
          patch = builtins.fetchurl
                  "https://gitlab.freedesktop.org/drm/amd/uploads/9ed46172039be5ec2579699937c55ada/fail_suspend.patch";
        }
      ];
    };
  };

So that would be the workflow to get AMD to fix issues in amdgpu.

Is this patch available in the latest kernel? I can’t check now, gitlab returns 504.

It is in the kernel since v6.1-rc4.

https://github.com/torvalds/linux/commit/8d4de331f1b24a22d18e3c6116aa25228cf54854

1 Like