I’ve been getting a driver error after switching to new hardware. Never had to debug anything like this, so please ask for more details. Here’s some info:
$ uname -a
Linux big-system 6.6.44 #1-NixOS SMP PREEMPT_DYNAMIC Sat Aug 3 06:54:42 UTC 2024 x86_64 GNU/Linux
$ dmesg | rg amdgpu
[ 0.000000] Command line: initrd=\EFI\nixos\i6l1b3gwjhmgqfha8wirqnwwi2d7z5lw-initrd-linux-6.6.44-initrd.efi init=/nix/store/abdplibma8crxqczj3n3nisq8qzkb8zs-nixos-system-big-system-24.05.20240810.a781ff3/init amdgpu.runpm=0 nohibernate loglevel=4
[ 0.044886] Kernel command line: initrd=\EFI\nixos\i6l1b3gwjhmgqfha8wirqnwwi2d7z5lw-initrd-linux-6.6.44-initrd.efi init=/nix/store/abdplibma8crxqczj3n3nisq8qzkb8zs-nixos-system-big-system-24.05.20240810.a781ff3/init amdgpu.runpm=0 nohibernate loglevel=4
[ 0.529091] stage-1-init: [Mon Aug 12 15:52:45 UTC 2024] loading module amdgpu...
[ 2.956418] [drm] amdgpu kernel modesetting enabled.
[ 2.956542] amdgpu: Virtual CRAT table created for CPU
[ 2.956560] amdgpu: Topology: Add CPU node
[ 2.960616] amdgpu 0000:2b:00.0: No more image in the PCI ROM
[ 2.960634] amdgpu 0000:2b:00.0: amdgpu: Fetched VBIOS from ROM BAR
[ 2.960639] amdgpu: ATOM BIOS: 115-D632BP2-100
[ 2.986743] amdgpu 0000:2b:00.0: vgaarb: deactivate vga console
[ 2.986746] amdgpu 0000:2b:00.0: amdgpu: Trusted Memory Zone (TMZ) feature disabled as experimental (default)
[ 2.986816] amdgpu 0000:2b:00.0: amdgpu: VRAM: 4080M 0x0000008000000000 - 0x00000080FEFFFFFF (4080M used)
[ 2.986819] amdgpu 0000:2b:00.0: amdgpu: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF
[ 2.986821] amdgpu 0000:2b:00.0: amdgpu: AGP: 267894784M 0x0000008400000000 - 0x0000FFFFFFFFFFFF
[ 2.986943] [drm] amdgpu: 4080M of VRAM memory ready
[ 2.986945] [drm] amdgpu: 7970M of GTT memory ready.
[ 4.884787] amdgpu 0000:2b:00.0: amdgpu: STB initialized to 2048 entries
[ 4.885580] amdgpu 0000:2b:00.0: amdgpu: Will use PSP to load VCN firmware
[ 5.053960] amdgpu 0000:2b:00.0: amdgpu: RAS: optional ras ta ucode is not available
[ 5.069648] amdgpu 0000:2b:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
[ 5.069672] amdgpu 0000:2b:00.0: amdgpu: smu driver if version = 0x0000000d, smu fw if version = 0x00000010, smu fw program = 0, version = 0x00492400 (73.36.0)
[ 5.069675] amdgpu 0000:2b:00.0: amdgpu: SMU driver if version not matched
[ 5.069708] amdgpu 0000:2b:00.0: amdgpu: use vbios provided pptable
[ 5.112009] amdgpu 0000:2b:00.0: amdgpu: SMU is initialized successfully!
[ 5.166577] amdgpu: HMM registered 4080MB device memory
[ 5.167775] kfd kfd: amdgpu: Allocated 3969056 bytes on gart
[ 5.167797] kfd kfd: amdgpu: Total number of KFD nodes to be created: 1
[ 5.167992] amdgpu: Virtual CRAT table created for GPU
[ 5.168158] amdgpu: Topology: Add dGPU node [0x743f:0x1002]
[ 5.168160] kfd kfd: amdgpu: added device 1002:743f
[ 5.168180] amdgpu 0000:2b:00.0: amdgpu: SE 1, SH per SE 2, CU per SH 8, active_cu_number 12
[ 5.169022] amdgpu 0000:2b:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[ 5.169025] amdgpu 0000:2b:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[ 5.169026] amdgpu 0000:2b:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[ 5.169028] amdgpu 0000:2b:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[ 5.169030] amdgpu 0000:2b:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[ 5.169031] amdgpu 0000:2b:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[ 5.169033] amdgpu 0000:2b:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[ 5.169034] amdgpu 0000:2b:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[ 5.169036] amdgpu 0000:2b:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
[ 5.169037] amdgpu 0000:2b:00.0: amdgpu: ring kiq_0.2.1.0 uses VM inv eng 11 on hub 0
[ 5.169039] amdgpu 0000:2b:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[ 5.169041] amdgpu 0000:2b:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 8
[ 5.170539] [drm] Initialized amdgpu 3.54.0 20150101 for 0000:2b:00.0 on minor 1
[ 5.176756] fbcon: amdgpudrmfb (fb0) is primary device
[ 5.268724] amdgpu 0000:2b:00.0: [drm] fb0: amdgpudrmfb frame buffer device
[ 10.829284] snd_hda_intel 0000:2b:00.1: bound 0000:2b:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])
[ 273.406616] amdgpu 0000:2b:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:7 pasid:32771, for process wezterm-gui pid 5044 thread wezterm-gu:cs0 pid 5072)
[ 273.406641] amdgpu 0000:2b:00.0: amdgpu: in page starting at address 0x000080019560e000 from client 0x1b (UTCL2)
[ 273.406645] amdgpu 0000:2b:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00701031
[ 273.406649] amdgpu 0000:2b:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
[ 273.406652] amdgpu 0000:2b:00.0: amdgpu: MORE_FAULTS: 0x1
[ 273.406656] amdgpu 0000:2b:00.0: amdgpu: WALKER_ERROR: 0x0
[ 273.406658] amdgpu 0000:2b:00.0: amdgpu: PERMISSION_FAULTS: 0x3
[ 273.406661] amdgpu 0000:2b:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 273.406664] amdgpu 0000:2b:00.0: amdgpu: RW: 0x0
[ 273.406675] amdgpu 0000:2b:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:7 pasid:32771, for process wezterm-gui pid 5044 thread wezterm-gu:cs0 pid 5072)
[ 273.406680] amdgpu 0000:2b:00.0: amdgpu: in page starting at address 0x0000800195612000 from client 0x1b (UTCL2)
[ 273.406684] amdgpu 0000:2b:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ 273.406686] amdgpu 0000:2b:00.0: amdgpu: Faulty UTCL2 client ID: CB/DB (0x0)
[ 273.406689] amdgpu 0000:2b:00.0: amdgpu: MORE_FAULTS: 0x0
[ 273.406692] amdgpu 0000:2b:00.0: amdgpu: WALKER_ERROR: 0x0
[ 273.406694] amdgpu 0000:2b:00.0: amdgpu: PERMISSION_FAULTS: 0x0
[ 273.406696] amdgpu 0000:2b:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 273.406700] amdgpu 0000:2b:00.0: amdgpu: RW: 0x0
[ 273.406706] amdgpu 0000:2b:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:7 pasid:32771, for process wezterm-gui pid 5044 thread wezterm-gu:cs0 pid 5072)
[ 273.406710] amdgpu 0000:2b:00.0: amdgpu: in page starting at address 0x000080050560a000 from client 0x1b (UTCL2)
[ 273.406714] amdgpu 0000:2b:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ 273.406717] amdgpu 0000:2b:00.0: amdgpu: Faulty UTCL2 client ID: CB/DB (0x0)
[ 273.406720] amdgpu 0000:2b:00.0: amdgpu: MORE_FAULTS: 0x0
[ 273.406722] amdgpu 0000:2b:00.0: amdgpu: WALKER_ERROR: 0x0
[ 273.406725] amdgpu 0000:2b:00.0: amdgpu: PERMISSION_FAULTS: 0x0
[ 273.406728] amdgpu 0000:2b:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 273.406730] amdgpu 0000:2b:00.0: amdgpu: RW: 0x0
[ 273.406737] amdgpu 0000:2b:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:7 pasid:32771, for process wezterm-gui pid 5044 thread wezterm-gu:cs0 pid 5072)
[ 273.406741] amdgpu 0000:2b:00.0: amdgpu: in page starting at address 0x0000800505606000 from client 0x1b (UTCL2)
[ 273.406743] amdgpu 0000:2b:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ 273.406746] amdgpu 0000:2b:00.0: amdgpu: Faulty UTCL2 client ID: CB/DB (0x0)
[ 273.406749] amdgpu 0000:2b:00.0: amdgpu: MORE_FAULTS: 0x0
[ 273.406752] amdgpu 0000:2b:00.0: amdgpu: WALKER_ERROR: 0x0
[ 273.406755] amdgpu 0000:2b:00.0: amdgpu: PERMISSION_FAULTS: 0x0
[ 273.406757] amdgpu 0000:2b:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 273.406760] amdgpu 0000:2b:00.0: amdgpu: RW: 0x0
[ 283.724101] amdgpu 0000:2b:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:7 pasid:32771, for process wezterm-gui pid 5044 thread wezterm-gu:cs0 pid 5072)
[ 283.724123] amdgpu 0000:2b:00.0: amdgpu: in page starting at address 0x000080050560a000 from client 0x1b (UTCL2)
[ 283.724128] amdgpu 0000:2b:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00701031
[ 283.724131] amdgpu 0000:2b:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
[ 283.724134] amdgpu 0000:2b:00.0: amdgpu: MORE_FAULTS: 0x1
[ 283.724137] amdgpu 0000:2b:00.0: amdgpu: WALKER_ERROR: 0x0
[ 283.724139] amdgpu 0000:2b:00.0: amdgpu: PERMISSION_FAULTS: 0x3
[ 283.724141] amdgpu 0000:2b:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 283.724143] amdgpu 0000:2b:00.0: amdgpu: RW: 0x0
[ 283.724153] amdgpu 0000:2b:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:7 pasid:32771, for process wezterm-gui pid 5044 thread wezterm-gu:cs0 pid 5072)
[ 283.724157] amdgpu 0000:2b:00.0: amdgpu: in page starting at address 0x0000800505606000 from client 0x1b (UTCL2)
[ 283.724161] amdgpu 0000:2b:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ 283.724163] amdgpu 0000:2b:00.0: amdgpu: Faulty UTCL2 client ID: CB/DB (0x0)
[ 283.724166] amdgpu 0000:2b:00.0: amdgpu: MORE_FAULTS: 0x0
[ 283.724168] amdgpu 0000:2b:00.0: amdgpu: WALKER_ERROR: 0x0
[ 283.724170] amdgpu 0000:2b:00.0: amdgpu: PERMISSION_FAULTS: 0x0
[ 283.724172] amdgpu 0000:2b:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 283.724175] amdgpu 0000:2b:00.0: amdgpu: RW: 0x0
[ 283.724181] amdgpu 0000:2b:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:7 pasid:32771, for process wezterm-gui pid 5044 thread wezterm-gu:cs0 pid 5072)
[ 283.724185] amdgpu 0000:2b:00.0: amdgpu: in page starting at address 0x000080019560e000 from client 0x1b (UTCL2)
[ 283.724188] amdgpu 0000:2b:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ 283.724191] amdgpu 0000:2b:00.0: amdgpu: Faulty UTCL2 client ID: CB/DB (0x0)
[ 283.724194] amdgpu 0000:2b:00.0: amdgpu: MORE_FAULTS: 0x0
[ 283.724196] amdgpu 0000:2b:00.0: amdgpu: WALKER_ERROR: 0x0
[ 283.724199] amdgpu 0000:2b:00.0: amdgpu: PERMISSION_FAULTS: 0x0
[ 283.724202] amdgpu 0000:2b:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 283.724204] amdgpu 0000:2b:00.0: amdgpu: RW: 0x0
[ 283.724211] amdgpu 0000:2b:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:7 pasid:32771, for process wezterm-gui pid 5044 thread wezterm-gu:cs0 pid 5072)
[ 283.724216] amdgpu 0000:2b:00.0: amdgpu: in page starting at address 0x0000800195612000 from client 0x1b (UTCL2)
[ 283.724219] amdgpu 0000:2b:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ 283.724221] amdgpu 0000:2b:00.0: amdgpu: Faulty UTCL2 client ID: CB/DB (0x0)
[ 283.724224] amdgpu 0000:2b:00.0: amdgpu: MORE_FAULTS: 0x0
[ 283.724226] amdgpu 0000:2b:00.0: amdgpu: WALKER_ERROR: 0x0
[ 283.724228] amdgpu 0000:2b:00.0: amdgpu: PERMISSION_FAULTS: 0x0
[ 283.724230] amdgpu 0000:2b:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 283.724232] amdgpu 0000:2b:00.0: amdgpu: RW: 0x0
[ 283.724239] amdgpu 0000:2b:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:7 pasid:32771, for process wezterm-gui pid 5044 thread wezterm-gu:cs0 pid 5072)
[ 283.724242] amdgpu 0000:2b:00.0: amdgpu: in page starting at address 0x0000800195612000 from client 0x1b (UTCL2)
[ 283.724244] amdgpu 0000:2b:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ 283.724246] amdgpu 0000:2b:00.0: amdgpu: Faulty UTCL2 client ID: CB/DB (0x0)
[ 283.724248] amdgpu 0000:2b:00.0: amdgpu: MORE_FAULTS: 0x0
[ 283.724250] amdgpu 0000:2b:00.0: amdgpu: WALKER_ERROR: 0x0
[ 283.724252] amdgpu 0000:2b:00.0: amdgpu: PERMISSION_FAULTS: 0x0
[ 283.724254] amdgpu 0000:2b:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 283.724255] amdgpu 0000:2b:00.0: amdgpu: RW: 0x0
[ 283.724262] amdgpu 0000:2b:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:7 pasid:32771, for process wezterm-gui pid 5044 thread wezterm-gu:cs0 pid 5072)
[ 283.724265] amdgpu 0000:2b:00.0: amdgpu: in page starting at address 0x0000800195612000 from client 0x1b (UTCL2)
[ 283.724267] amdgpu 0000:2b:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ 283.724269] amdgpu 0000:2b:00.0: amdgpu: Faulty UTCL2 client ID: CB/DB (0x0)
[ 283.724271] amdgpu 0000:2b:00.0: amdgpu: MORE_FAULTS: 0x0
[ 283.724272] amdgpu 0000:2b:00.0: amdgpu: WALKER_ERROR: 0x0
[ 283.724274] amdgpu 0000:2b:00.0: amdgpu: PERMISSION_FAULTS: 0x0
[ 283.724276] amdgpu 0000:2b:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 283.724278] amdgpu 0000:2b:00.0: amdgpu: RW: 0x0
[ 283.733953] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=49294, emitted seq=49296
[ 283.734711] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process wezterm-gui pid 5044 thread wezterm-gu:cs0 pid 5072
[ 283.735063] amdgpu 0000:2b:00.0: amdgpu: GPU reset begin!
[ 283.916028] amdgpu 0000:2b:00.0: amdgpu: MODE1 reset
[ 283.916038] amdgpu 0000:2b:00.0: amdgpu: GPU mode1 reset
[ 283.916121] amdgpu 0000:2b:00.0: amdgpu: GPU smu mode1 reset
[ 284.420071] amdgpu 0000:2b:00.0: amdgpu: GPU reset succeeded, trying to resume
[ 284.600682] amdgpu 0000:2b:00.0: amdgpu: RAS: optional ras ta ucode is not available
[ 284.616890] amdgpu 0000:2b:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
[ 284.616933] amdgpu 0000:2b:00.0: amdgpu: SMU is resuming...
[ 284.616940] amdgpu 0000:2b:00.0: amdgpu: smu driver if version = 0x0000000d, smu fw if version = 0x00000010, smu fw program = 0, version = 0x00492400 (73.36.0)
[ 284.616945] amdgpu 0000:2b:00.0: amdgpu: SMU driver if version not matched
[ 284.616980] amdgpu 0000:2b:00.0: amdgpu: use vbios provided pptable
[ 284.661035] amdgpu 0000:2b:00.0: amdgpu: SMU is resumed successfully!
[ 284.744332] amdgpu 0000:2b:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[ 284.744336] amdgpu 0000:2b:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[ 284.744339] amdgpu 0000:2b:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[ 284.744342] amdgpu 0000:2b:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[ 284.744344] amdgpu 0000:2b:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[ 284.744347] amdgpu 0000:2b:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[ 284.744349] amdgpu 0000:2b:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[ 284.744352] amdgpu 0000:2b:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[ 284.744354] amdgpu 0000:2b:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
[ 284.744356] amdgpu 0000:2b:00.0: amdgpu: ring kiq_0.2.1.0 uses VM inv eng 11 on hub 0
[ 284.744359] amdgpu 0000:2b:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[ 284.744361] amdgpu 0000:2b:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 8
[ 284.747468] amdgpu 0000:2b:00.0: amdgpu: recover vram bo from shadow start
[ 284.752213] amdgpu 0000:2b:00.0: amdgpu: recover vram bo from shadow done
[ 284.752276] amdgpu 0000:2b:00.0: amdgpu: GPU reset(2) succeeded!
[ 284.779374] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!