Amdgpu failing during initialization

The amdgpu module fails during initialization, after which there is no video signal. The system is still responsive as I can ctrl+alt+f2 and shutdown/reboot.

The problem occured after installing a new gpu and I have no other hardware to test it with atm. Starting my system with the “nomodeset” kernel option works and puts me in the tty.

journalctl:

Dec 27 14:31:56 desktop kernel: perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).
Dec 27 14:31:56 desktop kernel: amd_pstate: driver load is disabled, boot with specific mode to enable this
Dec 27 14:31:56 desktop kernel: kvm_amd: SVM disabled (by BIOS) in MSR_VM_CR on CPU 3
Dec 27 14:31:56 desktop systemd-modules-load[650]: Failed to insert module ‘kvm_amd’: Operation not supported
Dec 27 14:31:56 desktop kernel: kvm_amd: SVM disabled (by BIOS) in MSR_VM_CR on CPU 0
Dec 27 14:31:58 desktop kernel: [drm] amdgpu kernel modesetting enabled.
Dec 27 14:31:58 desktop kernel: amdgpu: Virtual CRAT table created for CPU
Dec 27 14:31:58 desktop kernel: amdgpu: Topology: Add CPU node
Dec 27 14:31:58 desktop kernel: amdgpu 0000:28:00.0: enabling device (0006 → 0007)
Dec 27 14:31:58 desktop kernel: amdgpu 0000:28:00.0: amdgpu: Fetched VBIOS from VFCT
Dec 27 14:31:58 desktop kernel: amdgpu: ATOM BIOS: 113-V502MECH-2OC
Dec 27 14:31:58 desktop kernel: amdgpu 0000:28:00.0: vgaarb: deactivate vga console
Dec 27 14:31:58 desktop kernel: amdgpu 0000:28:00.0: amdgpu: Trusted Memory Zone (TMZ) feature disabled as experimental (default)
Dec 27 14:31:58 desktop kernel: amdgpu 0000:28:00.0: amdgpu: VRAM: 8176M 0x0000008000000000 - 0x00000081FEFFFFFF (8176M used)
Dec 27 14:31:58 desktop kernel: amdgpu 0000:28:00.0: amdgpu: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF
Dec 27 14:31:58 desktop kernel: amdgpu 0000:28:00.0: amdgpu: AGP: 267894784M 0x0000008400000000 - 0x0000FFFFFFFFFFFF
Dec 27 14:31:58 desktop kernel: [drm] amdgpu: 8176M of VRAM memory ready
Dec 27 14:31:58 desktop kernel: [drm] amdgpu: 16013M of GTT memory ready.
Dec 27 14:31:59 desktop syncthing[1295]: [start] 2023/12/27 14:31:59 INFO: syncthing v1.26.1 “Gold Grasshopper” (go1.21.5 linux-amd64) nix@nix 1980-01-01 00:00:00 UTC [noupgrade, stnoupgrade]
Dec 27 14:32:00 desktop kernel: amdgpu 0000:28:00.0: amdgpu: STB initialized to 2048 entries
Dec 27 14:32:00 desktop kernel: amdgpu 0000:28:00.0: amdgpu: Will use PSP to load VCN firmware
Dec 27 14:32:00 desktop kernel: amdgpu 0000:28:00.0: amdgpu: RAS: optional ras ta ucode is not available
Dec 27 14:32:00 desktop kernel: amdgpu 0000:28:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
Dec 27 14:32:00 desktop kernel: amdgpu 0000:28:00.0: amdgpu: smu driver if version = 0x0000000f, smu fw if version = 0x00000013, smu fw program = 0, version = 0x003b2f00 (59.47.0)
Dec 27 14:32:00 desktop kernel: amdgpu 0000:28:00.0: amdgpu: SMU driver if version not matched
Dec 27 14:32:00 desktop kernel: amdgpu 0000:28:00.0: amdgpu: use vbios provided pptable
Dec 27 14:32:00 desktop kernel: amdgpu 0000:28:00.0: amdgpu: SMU is initialized successfully!
Dec 27 14:32:00 desktop kernel: snd_hda_intel 0000:28:00.1: bound 0000:28:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])
Dec 27 14:32:01 desktop kernel: amdgpu 0000:28:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] ERROR ring sdma0 test failed (-110)
Dec 27 14:32:01 desktop kernel: [drm:amdgpu_device_init [amdgpu]] ERROR hw_init of IP block <sdma_v5_2> failed -110
Dec 27 14:32:01 desktop kernel: amdgpu 0000:28:00.0: amdgpu: amdgpu_device_ip_init failed
Dec 27 14:32:01 desktop kernel: amdgpu 0000:28:00.0: amdgpu: Fatal error during GPU init
Dec 27 14:32:01 desktop kernel: amdgpu 0000:28:00.0: amdgpu: amdgpu: finishing device.

System:
Ryzen 5 3600
MSI B450 Gaming Plus
AMD RX 6600 XT

I am on the latest Kernel (6.6.8) and fwupdmgr lists no available updates.

hardware.enableRedistributableFirmware = true;
hardware.cpu.amd.updateMicrocode = true;
hardware.opengl.enable = true;
hardware.opengl.driSupport = true;
hardware.opengl.driSupport32Bit = true;
services.xserver.videoDrivers = ["amdgpu"];

My previous gpu was a GTX 970 and ran without problems.
I tried switching to the two previous LTS Kernel without any effect.

I guess the best thing to do right now would be to test it on a different machine, maybe also windows just to make sure the card itself is not at fault. I am working on getting different hardware to do this, is there anything else i could try in the meantime?

I switched to the 5.10 Kernel and enabled all firmware and now the module loads. However I cant get X to work.

Will continue and try to produce the minimal config for this to work.

EDIT:
It seems to be that the driver is just missing (and therefore not failing) from this version of the module. The situation is similiar to [SOLVED] Sway - Failed to open any DRM device / Applications & Desktop Environments / Arch Linux Forums (which was solved by upgrading the kernel).

Other versions give kernel update as a solution:

https://forums.linuxmint.com/viewtopic.php?t=403647

Returned the card and bought another one (same model, same manufacturer) which worked out of the box.