Greetings
I am actually trying to troubleshoot an issue with my amd card
[ 4.418857] amdgpu 0000:03:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:158 vmid:0 pasid:0, for process pid 0 thread pid 0)
[ 4.418861] amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x0000000006003000 from client 0x12 (VMC)
[ 4.418864] amdgpu 0000:03:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x0000073C
[ 4.418866] amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: DCEDMC (0x3)
[ 4.418867] amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x0
[ 4.418868] amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x6
[ 4.418869] amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x3
[ 4.418870] amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x1
[ 4.418871] amdgpu 0000:03:00.0: amdgpu: RW: 0x0
but that is kind of beside the point, specifically when my VMs are working hard i seem to be getting crashes, I think its related to my iGPU and how im trying to setup gpu passthrough and such… but i cant get good logs on the crashes.
I understand its heretical but checking out GPT ideas it seems like a tool such as kdump may be useful but im not sure how to set it up?
the redhat docs mention
13.1. Estimating the kdump size
When planning and building your
kdump
environment, it is important to know how much space the >crash dump file requires.The
makedumpfile --mem-usage
command estimates how much space the crash dump file requires. It >generates a memory usage report. The report helps you determine the dump level and which pages are >safe to be excluded.
however i dont see a makedumpfile
command nor does anything pop up on the nixpkgs site.
Does anyone know a good way to setup kdump or do what im attempting to do? or perhaps know how to just fix this issue with my amd card
Ryzen 9 7950x (raphael iGPU)
RX 6800 XT
nvidia gtx 750 ti (passed through)
perhaps relevant lines from config.nix
boot.initrd.kernelModules = [ "amdgpu" ];
# Bootloader.
boot.loader.systemd-boot.enable = true;
boot.loader.efi.canTouchEfiVariables = true;
# VFIO
boot.kernelParams = [ "amd_iommu=on" "iommu=pt" "amdgpu.ppfeaturemask=0xfffd3fff" ];
boot.blacklistedKernelModules = [ "nvidia" "nouveau" ];
boot.kernelModules = [ "kvm_amd" "vfio_virqfd" "vfio_pci" "vfio_iommu_type1" "vfio" ];
boot.extraModprobeConfig = "options vfio-pci ids=10de:1380,10de:0fbc";
# Virt-manager
virtualisation.libvirtd.enable = true;
Best,