Instability with pcie_aspm=off

I’ve been facing huge difficulties with getting external GPU (connected to thunderbolt port of laptop) + dell dock (connected to thunderbolt port of laptop) working with my laptop with pcie_aspm=off kernel parameter. I need to include this parameter to make hibernation work. Strange thing is if I dont include this parameter, the eGPU functions normally but the dell dock sometimes does not wake up from suspend and i need to manually open my clampshelled laptop and press its keys to wake it up. This happens even if I only have the dell dock attached to the laptop without the eGPU. Any idea what is the correlation between these events ? From what I understand pcie_aspm=off would just disable the power saving features via pcie manipulation.

On the other hand, if I include “pcie_aspm=off”, eGPU does not work reliably, I need to remove its pcie bus and do a pcie rescan for it to be detected and usable but then putting the system to suspend/hibernate crashes it.

Will be happy to provide any logs if necessary

what external gpu are you using?
Also have you tried it without the dell Dock?
Which kernel, nixos and driver version are you using?
Are you on X or Wayland?

  1. External gpu is amd 5700XT
  2. does not happen without dock or eGPU, laptop able to suspend/hibernate normally
  3. kernel: 6.12.9 / nixos version 24.11, same happened with 24.05
  4. wayland

Than just plug the eGpu in the laptop directly.
this issue is most likely not a software issue but the Dock is.
Its nether the less recommended to plug the eGpu with at least a 20Gbps connection.
If you than have also a monitor running over the Dockinsgtation and other Peripherals that is not optimal.

but the egpu also behaves unreliably whereby first time i plug it in it works fine with monitor but when i unplug and plug back in it hangs the system. Also why is it that I dont face any issues with the dock with pcie_aspm=off

So pcie_aspm=off stands for Active-State Power Management which means
that power is cut when a PCIe Device is not in use.
So now just to clarify.
When you connect your eGPU over the dock you face the issue that you can’t hibernate or suspend.
When you connect your eGPU directly to your laptop you face the issue that you cannot hotplug your GPU(disconnect it and reconnect it). But your hibernate and suspend issue is solved.
Right or did I understand something wrong.

Also could you post the outputs of following commands?

lspci -d ::03xx

dmesg | grep PCIe

lspci -k -d ::03xx

Also which Desktop Environment / window manager are you using because some have explicit GPU hotplug support

https://wiki.archlinux.org/title/External_GPU

Could you also post the output of demsg and journalctel after you plug your gpu out and in again?
post them here and share the link: pastebin.com

Yes let me collate all these details and will send it asap

1 Like

@Postboote here are the logs that you requested : ~  sudo lspci -d ::03xx - Pastebin.com

Heres exactly what im experiencing:

When egpu connected to the first thunderbolt port of the laptop and dell dock connected to the second thunderbolt port of the laptop and laptop is in clampshell and display is provided by monitor that is connected to the dell dock:

  • egpu functions normally and I can hotplug the egpu without any issues (i just need to rescan pcie every time i plug the egpu back)
  • suspend works reliably whereby if I put the laptop to suspend it wakes up normally
  1. Now if I remove both the egpu and dell dock from the thunderbolt port of the laptop and I open the laptop from clampshell:
  1. If before I plug out the egpu and dell dock from the laptop’s port, I first open the laptop from clampshell and let display appears on the laptop which it does and then I can remove the egpu and dell dock and the display functions normally BUT if now I put the laptop to hibernate, the laptop hibernates successfully and is also able to resume from hibernate and the dell dock also is able to drive the monitor display BUT the egpu is not detected when plugged to the laptop (when i rescan the pcie lanes it just gets stuck on scanning but im able to use the laptop, sometimes it will even freeze the laptop while scanning)

@Postboote Maybe these might help as well:

So i think this is not a software problem. The problem is with the Bandwidth of your PCIE lanes and your CPU.
So restarting is most likely the only solution.
But could you try plugging in your Monitor in you egpu and not your dock?