AMD GPU fails after OS update

Hi All,

After updating from NixOS 25.05.20250612.fd48718 (built 2025-06-12) to 25.05.20250613.5f4f306 (built 2025-06-15) my AMD GPU driver breaks and my two monitors remain powered off (there haven’t been any configuration changes during this time). Everything works fine if I boot from the earlier generation. All updates since then exhibit the same broken behaviour.

Kernel/Hardware summary:

Kernel: Linux 6.12.33
Display (AUS28CA): 3840x2160 @ 60 Hz (as 2560x1440) in 28" [External]
Display (VZ27A): 2560x1440 @ 60 Hz in 27" [External]
CPU: AMD Ryzen 9 7950X (32) @ 5.88 GHz
GPU 1: AMD Raphael [Integrated]
GPU 2: NVIDIA GeForce RTX 4060 Ti [Discrete]

(I use the NVIDIA GPU for LLMs)

amdgpu related entries from the system log:

Jun 17 19:41:12 akgd kernel: [drm] amdgpu kernel modesetting enabled.
Jun 17 19:41:12 akgd kernel: amdgpu: vga_switcheroo: detected switching method \_SB_.PCI0.GP17.VGA_.ATPX handle
Jun 17 19:41:12 akgd kernel: amdgpu: ATPX version 1, functions 0x00000000
Jun 17 19:41:12 akgd kernel: amdgpu: Virtual CRAT table created for CPU
Jun 17 19:41:12 akgd kernel: amdgpu: Topology: Add CPU node
Jun 17 19:41:12 akgd kernel: amdgpu 0000:72:00.0: enabling device (0006 -> 0007)
Jun 17 19:41:12 akgd kernel: amdgpu 0000:72:00.0: amdgpu: Fetched VBIOS from VFCT
Jun 17 19:41:12 akgd kernel: amdgpu: ATOM BIOS: 102-RAPHAEL-008
Jun 17 19:41:12 akgd kernel: amdgpu 0000:72:00.0: vgaarb: deactivate vga console
Jun 17 19:41:12 akgd kernel: amdgpu 0000:72:00.0: amdgpu: Trusted Memory Zone (TMZ) feature disabled as experimental (default)
Jun 17 19:41:12 akgd kernel: amdgpu 0000:72:00.0: amdgpu: VRAM: 512M 0x000000F400000000 - 0x000000F41FFFFFFF (512M used)
Jun 17 19:41:12 akgd kernel: amdgpu 0000:72:00.0: amdgpu: GART: 1024M 0x0000000000000000 - 0x000000003FFFFFFF
Jun 17 19:41:12 akgd kernel: [drm] amdgpu: 512M of VRAM memory ready
Jun 17 19:41:12 akgd kernel: [drm] amdgpu: 31717M of GTT memory ready.
Jun 17 19:41:12 akgd kernel: amdgpu 0000:72:00.0: amdgpu: reserve 0xa00000 from 0xf41e000000 for PSP TMR
Jun 17 19:41:12 akgd kernel: amdgpu 0000:72:00.0: amdgpu: failed to load ucode DMCUB(0x3D) 
Jun 17 19:41:12 akgd kernel: amdgpu 0000:72:00.0: amdgpu: psp gfx command LOAD_IP_FW(0x6) failed and response status is (0xFFFF0008)
Jun 17 19:41:12 akgd kernel: amdgpu 0000:72:00.0: amdgpu: RAS: optional ras ta ucode is not available
Jun 17 19:41:12 akgd kernel: amdgpu 0000:72:00.0: amdgpu: RAP: optional rap ta ucode is not available
Jun 17 19:41:12 akgd kernel: amdgpu 0000:72:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
Jun 17 19:41:12 akgd kernel: amdgpu 0000:72:00.0: amdgpu: SMU is initialized successfully!
Jun 17 19:41:13 akgd kernel: amdgpu 0000:72:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
Jun 17 19:41:13 akgd kernel: amdgpu 0000:72:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
Jun 17 19:41:13 akgd kernel: amdgpu 0000:72:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data

The log up until the last 3 lines is the same as a successful boot.

Some kernel parameters:

nix-repl> nixosConfigurations.akgd.config.boot.kernelParams
[
  "nohibernate"
  "loglevel=4"
  "lsm=landlock,yama,bpf"
  "nvidia-drm.modeset=1"
  "nvidia-drm.fbdev=1"
  "nvidia.NVreg_OpenRmEnableUnsupportedGpus=1"
]

nix-repl> nixosConfigurations.akgd.config.boot.kernelPatches
[ ]

nix-repl> nixosConfigurations.akgd.config.boot.kernelModules  
[
  "kvm-amd"
  "bridge"
  "macvlan"
  "tap"
  "tun"
  "zfs"
  "loop"
  "atkbd"
  "ctr"
  "nvidia_uvm"
  "nvidia"
  "nvidia_modeset"
  "nvidia_drm"
  "i2c-dev"
]

Any suggestions?
Thanks.

I’ve narrowed this down to a linux-firmware update:

commit 44a84770ea103cacd7ee8c3b468ae628f3ec63c7 (HEAD)
Author: Tom Vincent <github@tlvince.com>
Date:   Fri Jun 13 14:30:19 2025 +0000

    linux-firmware: 20250509 -> 20250613
    
    (cherry picked from commit c125d23d188b4fef5f1c2f59198ff61986a4e4f1)

diff --git a/pkgs/by-name/li/linux-firmware/package.nix b/pkgs/by-name/li/linux-firmware/package.nix
index e5f97ebebdb2..e1a6f4e9fade 100644
--- a/pkgs/by-name/li/linux-firmware/package.nix
+++ b/pkgs/by-name/li/linux-firmware/package.nix
@@ -22,11 +22,11 @@ let
 in
 stdenvNoCC.mkDerivation rec {
   pname = "linux-firmware";
-  version = "20250509";
+  version = "20250613";
 
   src = fetchzip {
-    url = "https://cdn.kernel.org/pub/linux/kernel/firmware/linux-firmware-${version}.tar.xz ";
-    hash = "sha256-0FrhgJQyCeRCa3s0vu8UOoN0ZgVCahTQsSH0o6G6hhY=";
+    url = "https://cdn.kernel.org/pub/linux/kernel/firmware/linux-firmware-${version}.tar.xz";
+    hash = "sha256-qygwQNl99oeHiCksaPqxxeH+H7hqRjbqN++Hf9X+gzs=";
   };
 
   postUnpack = ''
2 Likes

Were you able to fix this issue? I have the exact same issue. Everything works on kernel 6.12.31 but not with 6.12.33

For anyone else having the same issue. The temp fix I did was to pin the old version. Not a great “solution” but fixes the problem for now. Add this code after you hardware-configuration import block

  # Use system-wide overlays to override linux-firmware
  nixpkgs.overlays = [
    (final: prev: {
      linux-firmware = prev.linux-firmware.overrideAttrs (old: rec {
        version = "20250509";
        src = prev.fetchzip {
          url = "https://cdn.kernel.org/pub/linux/kernel/firmware/linux-firmware-${version}.tar.xz";
          hash = "sha256-0FrhgJQyCeRCa3s0vu8UOoN0ZgVCahTQsSH0o6G6hhY=";
        };
      });
    })
  ];
1 Like

I haven’t seen a fix. I actually pinned nixpkgs, but your workaround is better. I’m planning to bisect the linux-firmware to figure out which commit introduces the issue, and then report it on https://bugzilla.kernel.org. If you have a better suggestion, please let me know. :slight_smile:

It looks like it isn’t possible to download arbitrary commits of linux-firmware, so no bisect for now.

I’ve opened an issue on nixpkgs: linux-firmware: AMD GPU fails after firmware update · Issue #418212 · NixOS/nixpkgs · GitHub

@lagerstrom it would be great if you could add that you are also experiencing this issue and maybe a hardware summary.

1 Like

Great idea, added my system information to your ticket. Thank you for taking the time :slight_smile:

1 Like

Thanks! It looks like you don’t have an NVIDIA GPU as well, which I’m glad to see. I wasn’t looking forward to having to prove that the bug isn’t dependent on the NVIDIA GPU being present. :slight_smile:

1 Like