Graphics breaks after rebuild on nixos-25.11

json1c · December 6, 2025, 2:09pm

Hi. I’m experiencing a reproducible graphics regression after rebuilding my system.

The issue appears both on nixos-25.11 and after switching back to nixos-25.05.

What works:

Booting old system generation - graphics works perfectly
Same kernel / hardware / BIOS
No issues before the upgrade attempt

What breaks:

Any new nixos-rebuild switch after that point
Happens on both nixos-25.11 and after switching back to nixos-25.05
Result: broken graphics

Important detail

Switching nixpkgs channel back to 25.05 does not restore working graphics. Only booting the old generation does.

Version comparsion (working - broken)

Mesa: 25.0.7 - 25.2.6 (tested rollback, Mesa alone is NOT the cause)
libdrm: 2.4.124 → 2.4.125
Linux firmware: 20251111 → 20251125
Kernel also changed during the upgrade attempt

Rolling back Mesa alone did not fix the issue

Hardware

My hardware: AMD Ryzen 9 8945HX - iGPU AMD Radeon 610M

Crash logs

дек 06 13:18:39 legion kernel: amdgpu 0000:06:00.0: amdgpu: Dumping IP State
дек 06 13:18:39 legion kernel: amdgpu 0000:06:00.0: amdgpu: Dumping IP State Completed
дек 06 13:18:39 legion kernel: amdgpu 0000:06:00.0: amdgpu: [drm] AMDGPU device coredump file has been created
дек 06 13:18:39 legion kernel: amdgpu 0000:06:00.0: amdgpu: [drm] Check your /sys/class/drm/card2/device/devcoredump/data
дек 06 13:18:39 legion kernel: amdgpu 0000:06:00.0: amdgpu: ring gfx_0.1.0 timeout, signaled seq=14894, emitted seq=14896
дек 06 13:18:39 legion kernel: amdgpu 0000:06:00.0: amdgpu:  Process .kwin_wayland-w pid 2517 thread kwin_wayla:cs0 pid 2555
дек 06 13:18:39 legion kernel: amdgpu 0000:06:00.0: amdgpu: Starting gfx_0.1.0 ring reset
дек 06 13:18:39 legion kernel: amdgpu 0000:06:00.0: amdgpu: Ring gfx_0.1.0 reset failed
дек 06 13:18:39 legion kernel: amdgpu 0000:06:00.0: amdgpu: GPU reset begin!
дек 06 13:18:39 legion kernel: amdgpu 0000:06:00.0: [drm] *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic data
дек 06 13:18:40 legion kernel: amdgpu 0000:06:00.0: amdgpu: MODE2 reset
дек 06 13:18:40 legion kernel: amdgpu 0000:06:00.0: amdgpu: GPU reset succeeded, trying to resume
дек 06 13:18:40 legion kernel: [drm] PCIE GART of 1024M enabled (table at 0x000000F41FC00000).
дек 06 13:18:40 legion kernel: amdgpu 0000:06:00.0: amdgpu: PSP is resuming...
дек 06 13:18:40 legion kernel: amdgpu 0000:06:00.0: amdgpu: reserve 0xa00000 from 0xf41e000000 for PSP TMR
дек 06 13:18:40 legion kernel: amdgpu 0000:06:00.0: amdgpu: RAS: optional ras ta ucode is not available
дек 06 13:18:40 legion kernel: amdgpu 0000:06:00.0: amdgpu: RAP: optional rap ta ucode is not available
дек 06 13:18:40 legion kernel: amdgpu 0000:06:00.0: amdgpu: SECUREDISPLAY: optional securedisplay ta ucode is not available
дек 06 13:18:40 legion kernel: amdgpu 0000:06:00.0: amdgpu: SMU is resuming...
дек 06 13:18:40 legion kernel: amdgpu 0000:06:00.0: amdgpu: SMU is resumed successfully!
дек 06 13:18:40 legion kernel: amdgpu 0000:06:00.0: amdgpu: kiq ring mec 2 pipe 1 q 0
дек 06 13:18:40 legion kernel: amdgpu 0000:06:00.0: amdgpu: [drm] DMUB hardware initialized: version=0x05002C00
дек 06 13:18:40 legion kernel: amdgpu 0000:06:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
дек 06 13:18:40 legion kernel: amdgpu 0000:06:00.0: amdgpu: ring gfx_0.1.0 uses VM inv eng 1 on hub 0
дек 06 13:18:40 legion kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 4 on hub 0
дек 06 13:18:40 legion kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 5 on hub 0
дек 06 13:18:40 legion kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
дек 06 13:18:40 legion kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
дек 06 13:18:40 legion kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
дек 06 13:18:40 legion kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
дек 06 13:18:40 legion kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
дек 06 13:18:40 legion kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
дек 06 13:18:40 legion kernel: amdgpu 0000:06:00.0: amdgpu: ring kiq_0.2.1.0 uses VM inv eng 12 on hub 0
дек 06 13:18:40 legion kernel: amdgpu 0000:06:00.0: amdgpu: ring sdma0 uses VM inv eng 13 on hub 0
дек 06 13:18:40 legion kernel: amdgpu 0000:06:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 8
дек 06 13:18:40 legion kernel: amdgpu 0000:06:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 8
дек 06 13:18:40 legion kernel: amdgpu 0000:06:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 8
дек 06 13:18:40 legion kernel: amdgpu 0000:06:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 8

дек 06 13:18:40 legion kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
дек 06 13:18:40 legion kwin_wayland[2517]: 0x2: GL_CONTEXT_LOST in context lost
дек 06 13:18:40 legion kwin_wayland[2517]: 0x2: GL_CONTEXT_LOST in context lost
дек 06 13:18:40 legion kwin_wayland[2517]: 0x2: GL_CONTEXT_LOST in context lost
дек 06 13:18:40 legion kwin_wayland[2517]: 0x2: GL_CONTEXT_LOST in context lost
дек 06 13:18:40 legion kwin_wayland[2517]: 0x2: GL_CONTEXT_LOST in context lost
дек 06 13:18:40 legion kwin_wayland[2517]: 0x2: GL_CONTEXT_LOST in context lost
дек 06 13:18:40 legion kwin_wayland[2517]: 0x2: GL_CONTEXT_LOST in context lost
дек 06 13:18:40 legion kwin_wayland[2517]: Rendering a layer failed!
дек 06 13:18:40 legion kwin_wayland[2517]: Failed to find a working output layer configuration! Enabled layers:
дек 06 13:18:40 legion kwin_wayland[2517]: src QRectF(0,0 2560x1600) -> dst QRect(0,0 2560x1600)
дек 06 13:18:40 legion kwin_wayland[2517]: src QRectF(0,0 256x256) -> dst QRect(2552,-2 256x256)
дек 06 13:18:40 legion kwin_wayland[2517]: 0x2: GL_CONTEXT_LOST in context lost

waffle8946 · December 6, 2025, 2:42pm

Share your config particularly anything graphics and kernel related.

json1c · December 6, 2025, 3:59pm

{
  hardware.graphics.enable = true;

  # services.xserver.videoDrivers = [ "amdgpu" "nvidia" ];

  # rtx 5060
  hardware.nvidia = {
    modesetting.enable = true;
    package = unstable.nvidiaPackages.beta;

    powerManagement.enable = true;
    powerManagement.finegrained = true;

    nvidiaSettings = true;

    prime = {
      offload.enable = true;
      amdgpuBusId = "PCI:6:0:0"; # AMD
      nvidiaBusId = "PCI:1:0:0"; # RTX 5060
    };
  };
}

Kernel:

boot.kernelPackages = pkgs.linuxPackages_latest;

boot.kernelParams = [
  "mem_sleep_default=deep"
  "amd_pmc.enable_deep_pwr=1"
  "amd_pmc.dyndbg=+p"
  "amd_pstate=active"
  "amdgpu.exp_hw_support=1"
  "amdgpu.sg_display=0"
];

waffle8946 · December 6, 2025, 4:04pm

To be sure these IDs are right, can you run this command?

nix --experimental-features "flakes nix-command" run github:eclairevoyant/pcids

json1c · December 6, 2025, 4:20pm

PCI:1:0:0
	NVIDIA Corporation [10de]
	GB206M [GeForce RTX 5060 Max-Q / Mobile] [2d59]
PCI:6:0:0
	Advanced Micro Devices, Inc. [AMD/ATI] [1002]
	Raphael [164e]

TLATER · December 6, 2025, 4:31pm

Not 100% sure this is it, but mucking with kernel parameters is generally suspicious, and a kernel version change is likely to be the culprit here.

Those settings specifically are described by the kernel docs like this:

sg_display (int)

Disable S/G (scatter/gather) display (i.e., display from system memory). This option is only relevant on APUs. Set this option to 0 to disable S/G display if you experience flickering or other issues under memory pressure and report the issue.

exp_hw_support (int)

Enable experimental hw support (1 = enable). The default is 0 (disabled).

The former sounds like the kind of thing you enable for a week if you use a cutting edge kernel. The latter sounds like the kind of thing you enable for a week when you use cutting edge hardware on a cutting edge kernel.

Neither will be true during a NixOS upgrade, so I imagine neither should be set. In all likelihood, most of the kernel params you’re setting are similarly inadvisable to set permanently; try to remove them all and boot again.

If that solves it, it’s probably technically a kernel bug, but who knows whether upstream will bother to fix these kinds of edge cases for workarounds that have likely long since stopped being useful.

You should also consider to stop using the _latest kernel - the nvidia driver doesn’t play nicely with anything but LTS (which is the default), so this config will eventually cause issues for you anyway.

json1c · December 6, 2025, 4:35pm

Just to clarify: those kernel parameters were added later while trying to diagnose and work around this issue, not something I was intentionally carrying forward as a permanent configuration.

Unfortunately, removing them again and booting with all default kernel parameters did not resolve the problem — the graphics still break after a rebuild.

TLATER · December 6, 2025, 4:37pm

Fair, I’d still consider trying the default kernel for now, this makes it look like a kernel regression:

json1c:

дек 06 13:18:39 legion kernel: amdgpu 0000:06:00.0: amdgpu: [drm] AMDGPU device coredump file has been created
дек 06 13:18:39 legion kernel: amdgpu 0000:06:00.0: amdgpu: [drm] Check your /sys/class/drm/card2/device/devcoredump/data

json1c · December 6, 2025, 4:38pm

Just to add more data after further testing:

The currently working system generation uses kernel 6.17.8 - this is the only configuration where both graphics and Wi-Fi are functional.

Additional observations:

Kernel 6.18.0: Wi-Fi driver fails to load completely (device not detected), so that kernel is unusable for me.
Kernel 6.17.10: graphics issues persist, same behavior as with newer kernels.
Kernel 6.17.8-zen: same result, graphics still broken after rebuild.

So far, 6.17.8 (non-zen) is the only kernel version that works reliably on this system.

I’m currently keeping the system functional by sticking to that specific kernel, but I’m trying to understand what changed between these versions.

Kernel 6.17.8, which is the only version that works reliably for me, is no longer available in the nixos repositories. This means I currently have no straightforward way to install that kernel on a clean or “normal” rebuild - only the old system generation still has it pinned.

Because of that, I’m effectively locked to the old generation for now and can’t reproduce the working setup using the current nixpkgs/kernel packages.

TLATER · December 6, 2025, 4:45pm

I think this is a question for the upstream Linux kernel maintainers. Personally, I’d figure out what the regression in the wifi driver is about, rather than this bug in amdgpu that has already been fixed.

Does the LTS kernel not support your hardware at all? Tbh, in that case the device is recent enough I’d send it back and exchange it for a slightly older one; You need at least six monthsish old support for all hardware in the kernel if you want to use a device with an nvidia card on Linux reliably.

json1c · December 6, 2025, 4:57pm

My setup relies solely on amdgpu, and the regression I’m seeing is entirely within the amdgpu / DRM path. The issue reproduces even when the NVIDIA driver is not loaded or involved in any way.

So while I understand the general advice about NVIDIA and LTS kernels, in this particular case the problem is specifically with amdgpu on newer kernels, not with NVIDIA compatibility.

json1c · December 6, 2025, 5:09pm

Moreover, the Radeon 610M was released in October 2022, so it is not particularly new hardware and should already be well within the support window for LTS kernels.

TLATER · December 6, 2025, 5:13pm

Without asking on the kernel mailing list, the best way to get to the bottom of this is bisecting the kernel and getting the precise commit that introduced the regression. Once you have that you can do with the information what you will; reverting the commit and running a custom kernel altogether might be a solution if you’re not willing to risk running an unsupported kernel base (though it’s arguable whether patches like that are less risky than just running an EOL kernel from a support perspective, which you could also totally do if you wanted to).

This definitely sounds like it’s above the NixOS discourse’s paygrade, in either case, you’re going to have more luck finding someone with specific experience with this kernel module and potential regressions for your hardware upstream.

To be clear, does amdgpu work with the 6.18 driver? If it does, I’d switch course to figuring out what’s wrong with the wifi driver instead, because then upstream has clearly already fixed the amdgpu issue and will probably either backport the fix soon or have decided not to for some reason, and you’ll probably find the answers you’re looking for on the kernel mailing list.

In that case, have you tried the LTS kernel? Do amdgpu and the wifi driver both work on it? Why use _latest at all? I’m not talking about your GPU specifically, but rather the sum of devices in there; it’s not uncommon for a wifi chip to lack support for many years.

json1c · December 6, 2025, 5:18pm

To clarify: Wi-Fi was working fine on kernels prior to 6.18. The regression in the Wi-Fi driver only appears starting with 6.18.

As for amdgpu: it does work in the sense that the driver loads and graphics initialize correctly, but the GPU crashes roughly once per minute, producing amdgpu DRM coredumps. So it’s not completely broken, but clearly unstable on newer kernels.

I’ll also try the LTS kernel now and report back whether both amdgpu and Wi-Fi work reliably there.

json1c · December 6, 2025, 6:10pm

I’ve tried switching to the LTS kernel as suggested, but the problem remains.

Before any upgrades, the system was working fine on nixos-25.05. After upgrading to 25.11, things started breaking. Rolling back to 25.05 afterwards does not fix it anymore - only booting the old system generation 94 still works reliably.

waffle8946 · December 6, 2025, 9:24pm

Related maybe?

github.com/NixOS/nixpkgs

linux-firmware: amdgpu regression causing freezes, fix available upstream

opened 09:00PM - 01 Dec 25 UTC

aviallon

0.kind: bug

### Nixpkgs version - Stable (25.11) ### Describe the bug Exactly what is des…cribed here: <https://gitlab.freedesktop.org/drm/amd/-/issues/4737> There is a regression in amdgpu's firmware that causes SMU crashes, which manifests as bad stuttering on the screen. ### Steps to reproduce - Use a relatively recent AMD GPU (RX 7900 XT for instance) - Use linux-firmware 20251125 - Stress the CPU (using stress-ng for instance) (more details here: https://gitlab.freedesktop.org/drm/amd/-/issues/4737#note_3215978) ### Expected behaviour My screen does not completely freeze under load. ### Screenshots _No response_ ### Relevant log output ```console déc. 01 21:50:19 luke-skywalker-nixos kernel: [drm] PCIE GART of 512M enabled (table at 0x00000084FEB00000). déc. 01 21:50:19 luke-skywalker-nixos kernel: amdgpu 0000:10:00.0: amdgpu: PSP is resuming... déc. 01 21:50:19 luke-skywalker-nixos kernel: amdgpu 0000:10:00.0: amdgpu: reserve 0x1300000 from 0x84fc000000 for PSP TMR déc. 01 21:50:19 luke-skywalker-nixos kernel: amdgpu 0000:10:00.0: amdgpu: RAP: optional rap ta ucode is not available déc. 01 21:50:19 luke-skywalker-nixos kernel: amdgpu 0000:10:00.0: amdgpu: SECUREDISPLAY: optional securedisplay ta ucode is not available déc. 01 21:50:19 luke-skywalker-nixos kernel: amdgpu 0000:10:00.0: amdgpu: SMU is resuming... déc. 01 21:50:19 luke-skywalker-nixos kernel: amdgpu 0000:10:00.0: amdgpu: smu driver if version = 0x0000003d, smu fw if version = 0x00000040, smu fw program = 0, smu fw version = 0x004e8200 (78.130.0) déc. 01 21:50:19 luke-skywalker-nixos kernel: amdgpu 0000:10:00.0: amdgpu: SMU driver if version not matched déc. 01 21:50:19 luke-skywalker-nixos kernel: amdgpu 0000:10:00.0: amdgpu: SMU is resumed successfully! déc. 01 21:50:19 luke-skywalker-nixos kernel: amdgpu 0000:10:00.0: amdgpu: [drm] DMUB hardware initialized: version=0x07002F00 déc. 01 21:50:19 luke-skywalker-nixos kernel: amdgpu 0000:10:00.0: [drm] Cannot find any crtc or sizes déc. 01 21:50:19 luke-skywalker-nixos kernel: amdgpu 0000:10:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0 déc. 01 21:50:19 luke-skywalker-nixos kernel: amdgpu 0000:10:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0 déc. 01 21:50:19 luke-skywalker-nixos kernel: amdgpu 0000:10:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0 déc. 01 21:50:19 luke-skywalker-nixos kernel: amdgpu 0000:10:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0 déc. 01 21:50:19 luke-skywalker-nixos kernel: amdgpu 0000:10:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0 déc. 01 21:50:19 luke-skywalker-nixos kernel: amdgpu 0000:10:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0 déc. 01 21:50:19 luke-skywalker-nixos kernel: amdgpu 0000:10:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0 déc. 01 21:50:19 luke-skywalker-nixos kernel: amdgpu 0000:10:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0 déc. 01 21:50:19 luke-skywalker-nixos kernel: amdgpu 0000:10:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0 déc. 01 21:50:19 luke-skywalker-nixos kernel: amdgpu 0000:10:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0 déc. 01 21:50:19 luke-skywalker-nixos kernel: amdgpu 0000:10:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0 déc. 01 21:50:19 luke-skywalker-nixos kernel: amdgpu 0000:10:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8 déc. 01 21:50:19 luke-skywalker-nixos kernel: amdgpu 0000:10:00.0: amdgpu: ring vcn_unified_1 uses VM inv eng 1 on hub 8 déc. 01 21:50:19 luke-skywalker-nixos kernel: amdgpu 0000:10:00.0: amdgpu: ring jpeg_dec uses VM inv eng 4 on hub 8 déc. 01 21:50:19 luke-skywalker-nixos kernel: amdgpu 0000:10:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 14 on hub 0 déc. 01 21:50:19 luke-skywalker-nixos kernel: amdgpu 0000:10:00.0: [drm] Cannot find any crtc or sizes ``` ### Additional context _No response_ ### System metadata - system: `"x86_64-linux"` - host os: `Linux 6.17.9-cachyos, NixOS, 25.11 (Xantusia), 25.11pre-git` - multi-user?: `yes` - sandbox: `yes` - version: `nix-env (Nix) 2.32.4` - nixpkgs: `/nix/store/ycr2g3rk022wld5cnpxjxjlkqfpnpa6l-source` ### Notify maintainers @fpletz @K900 --- **Note for maintainers:** Please tag this issue in your pull request description. (i.e. `Resolves #ISSUE`.) ### I assert that this issue is relevant for Nixpkgs - [x] I assert that this is a bug and not a support request. - [x] I assert that this is not a [duplicate of an existing issue](https://github.com/NixOS/nixpkgs/issues?q=is%3Aopen+is%3Aissue+label%3A%220.kind%3A+bug%22+-label%3A%226.topic%3A+darwin%22+-label%3A%226.topic%3A+nixos%22). - [x] I assert that I have read the [NixOS Code of Conduct](https://github.com/NixOS/.github/blob/master/CODE_OF_CONDUCT.md) and agree to abide by it. ### Is this issue important to you? Add a :+1: [reaction] to [issues you find important]. [reaction]: https://github.blog/2016-03-10-add-reactions-to-pull-requests-issues-and-comments/ [issues you find important]: https://github.com/NixOS/nixpkgs/issues?q=is%3Aissue+is%3Aopen+sort%3Areactions-%2B1-desc

TLATER · December 7, 2025, 2:34am

That is weird - do you have git commits?

json1c · December 7, 2025, 4:41pm

Yes, the solution helped me.