Graphics breaks after rebuild on nixos-25.11

Hi. I’m experiencing a reproducible graphics regression after rebuilding my system.

The issue appears both on nixos-25.11 and after switching back to nixos-25.05.

What works:

  • Booting old system generation - graphics works perfectly

  • Same kernel / hardware / BIOS

  • No issues before the upgrade attempt

What breaks:

  • Any new nixos-rebuild switch after that point

  • Happens on both nixos-25.11 and after switching back to nixos-25.05

  • Result: broken graphics

Important detail

Switching nixpkgs channel back to 25.05 does not restore working graphics. Only booting the old generation does.

Version comparsion (working - broken)

  • Mesa: 25.0.7 - 25.2.6 (tested rollback, Mesa alone is NOT the cause)

  • libdrm: 2.4.124 → 2.4.125

  • Linux firmware: 20251111 → 20251125

  • Kernel also changed during the upgrade attempt

Rolling back Mesa alone did not fix the issue

Hardware

My hardware: AMD Ryzen 9 8945HX - iGPU AMD Radeon 610M

Crash logs

дек 06 13:18:39 legion kernel: amdgpu 0000:06:00.0: amdgpu: Dumping IP State
дек 06 13:18:39 legion kernel: amdgpu 0000:06:00.0: amdgpu: Dumping IP State Completed
дек 06 13:18:39 legion kernel: amdgpu 0000:06:00.0: amdgpu: [drm] AMDGPU device coredump file has been created
дек 06 13:18:39 legion kernel: amdgpu 0000:06:00.0: amdgpu: [drm] Check your /sys/class/drm/card2/device/devcoredump/data
дек 06 13:18:39 legion kernel: amdgpu 0000:06:00.0: amdgpu: ring gfx_0.1.0 timeout, signaled seq=14894, emitted seq=14896
дек 06 13:18:39 legion kernel: amdgpu 0000:06:00.0: amdgpu:  Process .kwin_wayland-w pid 2517 thread kwin_wayla:cs0 pid 2555
дек 06 13:18:39 legion kernel: amdgpu 0000:06:00.0: amdgpu: Starting gfx_0.1.0 ring reset
дек 06 13:18:39 legion kernel: amdgpu 0000:06:00.0: amdgpu: Ring gfx_0.1.0 reset failed
дек 06 13:18:39 legion kernel: amdgpu 0000:06:00.0: amdgpu: GPU reset begin!
дек 06 13:18:39 legion kernel: amdgpu 0000:06:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
дек 06 13:18:40 legion kernel: amdgpu 0000:06:00.0: amdgpu: MODE2 reset
дек 06 13:18:40 legion kernel: amdgpu 0000:06:00.0: amdgpu: GPU reset succeeded, trying to resume
дек 06 13:18:40 legion kernel: [drm] PCIE GART of 1024M enabled (table at 0x000000F41FC00000).
дек 06 13:18:40 legion kernel: amdgpu 0000:06:00.0: amdgpu: PSP is resuming...
дек 06 13:18:40 legion kernel: amdgpu 0000:06:00.0: amdgpu: reserve 0xa00000 from 0xf41e000000 for PSP TMR
дек 06 13:18:40 legion kernel: amdgpu 0000:06:00.0: amdgpu: RAS: optional ras ta ucode is not available
дек 06 13:18:40 legion kernel: amdgpu 0000:06:00.0: amdgpu: RAP: optional rap ta ucode is not available
дек 06 13:18:40 legion kernel: amdgpu 0000:06:00.0: amdgpu: SECUREDISPLAY: optional securedisplay ta ucode is not available
дек 06 13:18:40 legion kernel: amdgpu 0000:06:00.0: amdgpu: SMU is resuming...
дек 06 13:18:40 legion kernel: amdgpu 0000:06:00.0: amdgpu: SMU is resumed successfully!
дек 06 13:18:40 legion kernel: amdgpu 0000:06:00.0: amdgpu: kiq ring mec 2 pipe 1 q 0
дек 06 13:18:40 legion kernel: amdgpu 0000:06:00.0: amdgpu: [drm] DMUB hardware initialized: version=0x05002C00
дек 06 13:18:40 legion kernel: amdgpu 0000:06:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
дек 06 13:18:40 legion kernel: amdgpu 0000:06:00.0: amdgpu: ring gfx_0.1.0 uses VM inv eng 1 on hub 0
дек 06 13:18:40 legion kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 4 on hub 0
дек 06 13:18:40 legion kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 5 on hub 0
дек 06 13:18:40 legion kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
дек 06 13:18:40 legion kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
дек 06 13:18:40 legion kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
дек 06 13:18:40 legion kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
дек 06 13:18:40 legion kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
дек 06 13:18:40 legion kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
дек 06 13:18:40 legion kernel: amdgpu 0000:06:00.0: amdgpu: ring kiq_0.2.1.0 uses VM inv eng 12 on hub 0
дек 06 13:18:40 legion kernel: amdgpu 0000:06:00.0: amdgpu: ring sdma0 uses VM inv eng 13 on hub 0
дек 06 13:18:40 legion kernel: amdgpu 0000:06:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 8
дек 06 13:18:40 legion kernel: amdgpu 0000:06:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 8
дек 06 13:18:40 legion kernel: amdgpu 0000:06:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 8
дек 06 13:18:40 legion kernel: amdgpu 0000:06:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 8

дек 06 13:18:40 legion kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
дек 06 13:18:40 legion kwin_wayland[2517]: 0x2: GL_CONTEXT_LOST in context lost
дек 06 13:18:40 legion kwin_wayland[2517]: 0x2: GL_CONTEXT_LOST in context lost
дек 06 13:18:40 legion kwin_wayland[2517]: 0x2: GL_CONTEXT_LOST in context lost
дек 06 13:18:40 legion kwin_wayland[2517]: 0x2: GL_CONTEXT_LOST in context lost
дек 06 13:18:40 legion kwin_wayland[2517]: 0x2: GL_CONTEXT_LOST in context lost
дек 06 13:18:40 legion kwin_wayland[2517]: 0x2: GL_CONTEXT_LOST in context lost
дек 06 13:18:40 legion kwin_wayland[2517]: 0x2: GL_CONTEXT_LOST in context lost
дек 06 13:18:40 legion kwin_wayland[2517]: Rendering a layer failed!
дек 06 13:18:40 legion kwin_wayland[2517]: Failed to find a working output layer configuration! Enabled layers:
дек 06 13:18:40 legion kwin_wayland[2517]: src QRectF(0,0 2560x1600) -> dst QRect(0,0 2560x1600)
дек 06 13:18:40 legion kwin_wayland[2517]: src QRectF(0,0 256x256) -> dst QRect(2552,-2 256x256)
дек 06 13:18:40 legion kwin_wayland[2517]: 0x2: GL_CONTEXT_LOST in context lost

1 Like

Share your config particularly anything graphics and kernel related.

{
  hardware.graphics.enable = true;

  # services.xserver.videoDrivers = [ "amdgpu" "nvidia" ];

  # rtx 5060
  hardware.nvidia = {
    modesetting.enable = true;
    package = unstable.nvidiaPackages.beta;

    powerManagement.enable = true;
    powerManagement.finegrained = true;

    nvidiaSettings = true;

    prime = {
      offload.enable = true;
      amdgpuBusId = "PCI:6:0:0"; # AMD
      nvidiaBusId = "PCI:1:0:0"; # RTX 5060
    };
  };
}

Kernel:

boot.kernelPackages = pkgs.linuxPackages_latest;

boot.kernelParams = [
  "mem_sleep_default=deep"
  "amd_pmc.enable_deep_pwr=1"
  "amd_pmc.dyndbg=+p"
  "amd_pstate=active"
  "amdgpu.exp_hw_support=1"
  "amdgpu.sg_display=0"
];

To be sure these IDs are right, can you run this command?

nix --experimental-features "flakes nix-command" run github:eclairevoyant/pcids
PCI:1:0:0
	NVIDIA Corporation [10de]
	GB206M [GeForce RTX 5060 Max-Q / Mobile] [2d59]
PCI:6:0:0
	Advanced Micro Devices, Inc. [AMD/ATI] [1002]
	Raphael [164e]
1 Like

Not 100% sure this is it, but mucking with kernel parameters is generally suspicious, and a kernel version change is likely to be the culprit here.

Those settings specifically are described by the kernel docs like this:

sg_display (int)

Disable S/G (scatter/gather) display (i.e., display from system memory). This option is only relevant on APUs. Set this option to 0 to disable S/G display if you experience flickering or other issues under memory pressure and report the issue.

exp_hw_support (int)

Enable experimental hw support (1 = enable). The default is 0 (disabled).

The former sounds like the kind of thing you enable for a week if you use a cutting edge kernel. The latter sounds like the kind of thing you enable for a week when you use cutting edge hardware on a cutting edge kernel.

Neither will be true during a NixOS upgrade, so I imagine neither should be set. In all likelihood, most of the kernel params you’re setting are similarly inadvisable to set permanently; try to remove them all and boot again.

If that solves it, it’s probably technically a kernel bug, but who knows whether upstream will bother to fix these kinds of edge cases for workarounds that have likely long since stopped being useful.

You should also consider to stop using the _latest kernel - the nvidia driver doesn’t play nicely with anything but LTS (which is the default), so this config will eventually cause issues for you anyway.

1 Like

Just to clarify: those kernel parameters were added later while trying to diagnose and work around this issue, not something I was intentionally carrying forward as a permanent configuration.

Unfortunately, removing them again and booting with all default kernel parameters did not resolve the problem — the graphics still break after a rebuild.

Fair, I’d still consider trying the default kernel for now, this makes it look like a kernel regression:

Just to add more data after further testing:

The currently working system generation uses kernel 6.17.8 - this is the only configuration where both graphics and Wi-Fi are functional.

Additional observations:

  • Kernel 6.18.0: Wi-Fi driver fails to load completely (device not detected), so that kernel is unusable for me.

  • Kernel 6.17.10: graphics issues persist, same behavior as with newer kernels.

  • Kernel 6.17.8-zen: same result, graphics still broken after rebuild.

So far, 6.17.8 (non-zen) is the only kernel version that works reliably on this system.

I’m currently keeping the system functional by sticking to that specific kernel, but I’m trying to understand what changed between these versions.

Kernel 6.17.8, which is the only version that works reliably for me, is no longer available in the nixos repositories. This means I currently have no straightforward way to install that kernel on a clean or “normal” rebuild - only the old system generation still has it pinned.

Because of that, I’m effectively locked to the old generation for now and can’t reproduce the working setup using the current nixpkgs/kernel packages.

1 Like

I think this is a question for the upstream Linux kernel maintainers. Personally, I’d figure out what the regression in the wifi driver is about, rather than this bug in amdgpu that has already been fixed.

Does the LTS kernel not support your hardware at all? Tbh, in that case the device is recent enough I’d send it back and exchange it for a slightly older one; You need at least six monthsish old support for all hardware in the kernel if you want to use a device with an nvidia card on Linux reliably.

My setup relies solely on amdgpu, and the regression I’m seeing is entirely within the amdgpu / DRM path. The issue reproduces even when the NVIDIA driver is not loaded or involved in any way.

So while I understand the general advice about NVIDIA and LTS kernels, in this particular case the problem is specifically with amdgpu on newer kernels, not with NVIDIA compatibility.

Moreover, the Radeon 610M was released in October 2022, so it is not particularly new hardware and should already be well within the support window for LTS kernels.

Without asking on the kernel mailing list, the best way to get to the bottom of this is bisecting the kernel and getting the precise commit that introduced the regression. Once you have that you can do with the information what you will; reverting the commit and running a custom kernel altogether might be a solution if you’re not willing to risk running an unsupported kernel base (though it’s arguable whether patches like that are less risky than just running an EOL kernel from a support perspective, which you could also totally do if you wanted to).

This definitely sounds like it’s above the NixOS discourse’s paygrade, in either case, you’re going to have more luck finding someone with specific experience with this kernel module and potential regressions for your hardware upstream.

To be clear, does amdgpu work with the 6.18 driver? If it does, I’d switch course to figuring out what’s wrong with the wifi driver instead, because then upstream has clearly already fixed the amdgpu issue and will probably either backport the fix soon or have decided not to for some reason, and you’ll probably find the answers you’re looking for on the kernel mailing list.

In that case, have you tried the LTS kernel? Do amdgpu and the wifi driver both work on it? Why use _latest at all? I’m not talking about your GPU specifically, but rather the sum of devices in there; it’s not uncommon for a wifi chip to lack support for many years.

1 Like

To clarify: Wi-Fi was working fine on kernels prior to 6.18. The regression in the Wi-Fi driver only appears starting with 6.18.

As for amdgpu: it does work in the sense that the driver loads and graphics initialize correctly, but the GPU crashes roughly once per minute, producing amdgpu DRM coredumps. So it’s not completely broken, but clearly unstable on newer kernels.

I’ll also try the LTS kernel now and report back whether both amdgpu and Wi-Fi work reliably there.

I’ve tried switching to the LTS kernel as suggested, but the problem remains.

Before any upgrades, the system was working fine on nixos-25.05. After upgrading to 25.11, things started breaking. Rolling back to 25.05 afterwards does not fix it anymore - only booting the old system generation 94 still works reliably.

Related maybe?

1 Like

That is weird - do you have git commits?

Yes, the solution helped me.

1 Like