Lockups with kernel 6.14.7 and AMD GPUs

Hi all,

kernel 6.14.7 seems to have introduced a bug with AMD GPUs locking up constantly the whole system:

Mai 21 18:36:44 puffy kernel: amdgpu 0000:2b:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
Mai 21 18:36:44 puffy kernel: amdgpu 0000:2b:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
Mai 21 18:36:44 puffy kernel: amdgpu 0000:2b:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
Mai 21 18:36:44 puffy kernel: amdgpu 0000:2b:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000012 SMN_C2PMSG_82:0x00000005
Mai 21 18:36:49 puffy kernel: amdgpu 0000:2b:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000012 SMN_C2PMSG_82:0x00000005
Mai 21 18:36:49 puffy kernel: amdgpu 0000:2b:00.0: amdgpu: Failed to disable gfxoff!
Mai 21 18:36:54 puffy kernel: amdgpu 0000:2b:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000012 SMN_C2PMSG_82:0x00000005
Mai 21 18:36:54 puffy kernel: amdgpu 0000:2b:00.0: amdgpu: Failed to retrieve enabled ppfeatures!
Mai 21 18:36:59 puffy kernel: watchdog: CPU2: Watchdog detected hard LOCKUP on cpu 2
Mai 21 18:36:59 puffy kernel: rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
Mai 21 18:36:59 puffy kernel: rcu:         2-...0: (0 ticks this GP) idle=f0f4/1/0x4000000000000002 softirq=29712/29712 fqs=3888
Mai 21 18:36:59 puffy kernel: rcu:         (detected by 15, t=21002 jiffies, g=57805, q=567 ncpus=16)
Mai 21 18:36:59 puffy kernel: amdgpu 0000:2b:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000012 SMN_C2PMSG_82:0x00000005
Mai 21 18:37:04 puffy kernel: watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [.gnome-shell-wr:2539]
Mai 21 18:37:05 puffy kernel: amdgpu 0000:2b:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000012 SMN_C2PMSG_82:0x00000005
Mai 21 18:37:05 puffy kernel: amdgpu 0000:2b:00.0: amdgpu: Failed to disable gfxoff!
Mai 21 18:37:10 puffy kernel: amdgpu 0000:2b:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000012 SMN_C2PMSG_82:0x00000005
Mai 21 18:37:15 puffy kernel: amdgpu 0000:2b:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000012 SMN_C2PMSG_82:0x00000005
Mai 21 18:37:15 puffy kernel: amdgpu 0000:2b:00.0: amdgpu: Failed to retrieve enabled ppfeatures!
Mai 21 18:37:20 puffy kernel: watchdog: BUG: soft lockup - CPU#7 stuck for 23s! [IPC Launch:3353]
Mai 21 18:37:20 puffy kernel: amdgpu 0000:2b:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000012 SMN_C2PMSG_82:0x00000005
Mai 21 18:37:20 puffy kernel: amdgpu 0000:2b:00.0: amdgpu: Failed to disable gfxoff!
Mai 21 18:37:26 puffy kernel: amdgpu 0000:2b:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000012 SMN_C2PMSG_82:0x00000005
Mai 21 18:37:26 puffy kernel: amdgpu 0000:2b:00.0: amdgpu: Failed to retrieve enabled ppfeatures!
Mai 21 18:37:31 puffy kernel: amdgpu 0000:2b:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000012 SMN_C2PMSG_82:0x00000005
Mai 21 18:37:32 puffy kernel: watchdog: BUG: soft lockup - CPU#4 stuck for 48s! [.gnome-shell-wr:2539]
Mai 21 18:37:36 puffy kernel: amdgpu 0000:2b:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000012 SMN_C2PMSG_82:0x00000005
Mai 21 18:37:36 puffy kernel: amdgpu 0000:2b:00.0: amdgpu: Failed to disable gfxoff!
Mai 21 18:37:37 puffy kernel: amdgpu 0000:2b:00.0: amdgpu: ring gfx_0.1.0 timeout, signaled seq=7713, emitted seq=7714
Mai 21 18:37:37 puffy kernel: amdgpu 0000:2b:00.0: amdgpu: Process information: process .gnome-shell-wr pid 2539 thread gnome-shel:cs0 pid 2575
Mai 21 18:37:37 puffy kernel: amdgpu 0000:2b:00.0: amdgpu: Starting gfx_0.1.0 ring reset
Mai 21 18:37:37 puffy kernel: amdgpu 0000:2b:00.0: amdgpu: Ring gfx_0.1.0 reset failure
Mai 21 18:37:42 puffy kernel: amdgpu 0000:2b:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000012 SMN_C2PMSG_82:0x00000005
Mai 21 18:37:47 puffy kernel: amdgpu 0000:2b:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000012 SMN_C2PMSG_82:0x00000005
Mai 21 18:37:47 puffy kernel: amdgpu 0000:2b:00.0: amdgpu: Failed to retrieve enabled ppfeatures!

See issue [drm] ERROR dc_dmub_srv_log_diagnostic_data: DMCUB error on freedesktop git.

Reverting to 6.14.6 seems to mitigate the issue for now until 6.14.8 is available:

  boot.kernelPackages = pkgs.linuxPackagesFor (pkgs.linux_6_14.override {
    argsOverride = rec {
      src = pkgs.fetchurl {
        url = "mirror://kernel/linux/kernel/v6.x/linux-${version}.tar.xz";
        sha256 = "sha256-IYF/GZjiIw+B9+T2Bfpv3LBA4U+ifZnCfdsWznSXl6k=";
      };
      version = "6.14.6";
      modDirVersion = "6.14.6";
    };
  });

Anyone else with similar issues?

Please note that 6.14.7 was released to 24.11 and 25.05

3 Likes

I haven’t got anything useful to add but I can never resist a direct question, so, yes.