How to use Nvidia PRIME offload to run the X server on the integrated board

Well, that’s disappointing. Here are the last two ideas I have for gaining a better understanding of what’s going on.

  1. Disable PRIME offload, run the X server on the NVIDIA GPU, then run SMI and compare that report against the enabled-PRIME-offload SMI reports we’ve seen.

    If the reports match, then that would suggest X does run on the NVIDIA GPU and not the integrated GPU.

    If the disabled-PRIME-offload SMI report uses more memory for X, that would suggest X may be running on the integrated GPU, but also managing the NVIDIA GPU in some fashion.

  2. Stop the X server, log in via linux console, and see if cat /sys/bus/pci/devices/0000:01:00.0/power/runtime_status actually reports suspended. If it still doesn’t, then I wonder if the NVIDIA drivers on Linux even bother with power management.

Did you try glxinfo?

$ glxinfo | grep 'OpenGL render'
OpenGL renderer string: Mesa DRI Intel(R) Haswell Mobile 
$ nvidia-offload glxinfo | grep 'OpenGL render'
OpenGL renderer string: GeForce GT 740M/PCIe/SSE2

This shows whether in principle both cards can be used, depending on the environment (nvidia-offloads does nothing than setting environment variables).

2 Likes

For ref, I’ve got this:

$ glxinfo | grep 'OpenGL render'
OpenGL renderer string: Mesa Intel(R) UHD Graphics (CML GT2)
$ nvidia-offload glxinfo | grep 'OpenGL render'
OpenGL renderer string: Quadro T1000/PCIe/SSE2

So, offload seems to be working properly here.

1 Like

[…] then I wonder if the NVIDIA drivers on Linux even bother with power management.

https://download.nvidia.com/XFree86/Linux-x86_64/435.21/README/dynamicpowermanagement.html

1 Like

I’m sorry for leaving this thread without further reporting, I was busy with family matters the past few weeks.

So, cat /sys/bus/pci/devices/0000:01:00.0/power/runtime_status reports “active” even before I have used any offloading.

Using “modesetting” in the driver list breaks completely nvidia-offload.

Finally, glxinfo does report correctly that I’m using Intel when not offloading, and Nvidia when offloading. But, again, the nvidia board seems to always be active.

1 Like

Did you activate power management as in the nvidia link above? I cannot test it because it doesn’t work on kepler cards. Btw., I can see whether the nvidia card is on from the power consumption.

The brute force hack is like in arch’s nividia-xrun:

Use an X session with only the integrated card, unload all nvidia kernel modules and

$ echo 1 > /sys/bus/pci/devices/0000\:08\:00.0/remove

(with your bus id).

When an application shall use the nvidia card

$ echo 1 > /sys/bus/pci/rescan

then load the drivers and open a second X session using the nvidia card. Don’t try to remove the device as long as the nvidia drivers are loaded, and don’t try to unload the nvidia drivers as long as the card is used!

I’ve put this into a script which is a server running as root which does this part and communicates via a FIFO. The missing bit so far is only flushing the fifo without blocking. If anyone knows how to do this cleanly with FIFOs or sockets in bash (e.g. using socat), please tell me.

This link is helpful, and after reading into that I’ve seen in NixOS options there is one option to enable power management, hardware.nvidia.powerManagement.enable. I’ll test and see if this solves my doubt.

If it doesn’t work, how would I set those options in my configuration.nix?

Ok, I’ve added:

  boot.extraModprobeConfig = "options nvidia \"NVreg_DynamicPowerManagement=0x02\"\n";
  services.udev.extraRules = ''
  # Remove NVIDIA USB xHCI Host Controller devices, if present
  ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x0c0330", ATTR{remove}="1"
  
  # Remove NVIDIA USB Type-C UCSI devices, if present
  ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x0c8000", ATTR{remove}="1"
  
  # Remove NVIDIA Audio devices, if present
  ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x040300", ATTR{remove}="1"
  
  # Enable runtime PM for NVIDIA VGA/3D controller devices on driver bind
  ACTION=="bind", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x030000", TEST=="power/control", ATTR{power/control}="auto"
  ACTION=="bind", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x030200", TEST=="power/control", ATTR{power/control}="auto"
  
  # Disable runtime PM for NVIDIA VGA/3D controller devices on driver unbind
  ACTION=="unbind", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x030000", TEST=="power/control", ATTR{power/control}="on"
  ACTION=="unbind", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x030200", TEST=="power/control", ATTR{power/control}="on"
  '';

to my configuration.nix

That has changed /sys/bus/pci/devices/0000:01:00.0/power/control to auto, as expected. But when I cat /sys/bus/pci/devices/0000:01:00.0/power/runtime_status, I still get active.

I’m not sure whether that solution actually turns off the GPU, or if it just keeps it in a low-powered state. I’ll give it a test in the next few days.

What does cat /proc/driver/nvidia/gpus/*/power say?

The GPU cannot be powered off, I believe this is due to a limitation of Xorg and not a limitation of the hardware support for Turing. Turing allowed for a better low power mode using RTD3.

Runtime D3 status: Not supported
Video Memory: Active

GPU Hardware Support:
Video Memory Self Refresh: Not Supported
Video Memory Off: Not Supported

@eadwu According to the page I cited above, the GPU can be turned off in the supported configurations:

However, the lowest power states for NVIDIA GPUs require turning power off to the entire chip, often through ACPI calls.
[…]

Supported Configurations

[…]

  • […] The necessary hardware and ACPI support was first added in Intel Coffeelake chipset series. Hence, this feature is supported from Intel Coffeelake chipset series.
  • This feature requires a Turing or newer GPU.
  • This feature is supported with Linux kernel versions 4.18 and newer. […]

[…]

@Denommus However, your card (GeForce GTX 960M) is of the Maxwell generation, which is 2 generations older than Turing. Thererfore ”Runtime D3 [management is] Not supported“. You’re in the same situation as I am. So you can

  1. either live without turning off the nvidia card,
  2. or use nouveau which has worse performance than my intel card on my notebook,
  3. or switch on an X session basis (which can also be embedded into a running X session, e.g. with Bumblebee. However, bumblebee has no recent release, so still using it might be a bit experimental. Also, bbswitch does not work anymore. The arch wiki contains information about the different methods for turning off nividia gpus.

My post above covers case 3., but I will only configure the two different X sessions in NixOS when release 20.09 is out, because there the mechanism for cloning configurations will change (look for nesting.clone/specialization).

3 Likes

I was wondering for months, why my nvidia-offload works but the graphics card doesn’t completely switch off. So, thank you for all the investigation that happened within in this thread!

This is especially annoying in my case since the fans of my notebook (XPS 9560) become really noisy when graphics card is not switched off. For me, it is kind of hard to believe that Intel+Nvidia, after so many years of producing laptops with dual graphics, only managed since Turing/Coffee Lake to support a complete shutdown of the graphics card.

I added a reference and explanation to your (@jorsn) last post to the Nvidia nixos wiki page.

1 Like

I can confirm as well, that the udev rules disable my Nvidia GPU when not being used :slight_smile:

Hey all, having discovered this thread i now know that i’m out of luck. My GPU seems to be too old (it’s a Thinkpad P43s, lspci reports 3D controller: NVIDIA Corporation GP108GLM [Quadro P520] (rev a1)) to support runtime power management properly.

$ cat /proc/driver/nvidia/gpus/*/power
Runtime D3 status:          Not supported
Video Memory:               Active

GPU Hardware Support:
 Video Memory Self Refresh: Not Supported
 Video Memory Off:          Not Supported

While it seems the nVidia card does go into a low-power state, my battery life is still terrible (i have configured TLP to try to reduce power consumption of all the other things).

I hope this isn’t too out-of-scope for this thread, but i’d love to be able to hard-disable the nVidia card, as in, never turn it on in the first place, offloading be damned. Is this possible? Unfortunately i’m not sure where or how to experiment with this. I have tried blacklisting the nvidia driver, but that doesn’t seem to turn off the PCI device. Since i just use terminals and things, i’d love to be able to just force Intel-only video.

Thank you to all those already providing assistance in this thread! :raised_hands:

EDIT this https://major.io/p/disable-nvidia-gpu-thinkpad-t490/ actually looks quite promising, but i wouldn’t know how to go about doing that in NixOS!

Hey @paul,

you may want to check out the GitHub - NixOS/nixos-hardware: A collection of NixOS modules covering hardware quirks. repository. For my notebook, there is a intel-configuration that switches off the nvidia card entirely:

https://github.com/NixOS/nixos-hardware/blob/40ade7c0349d31e9f9722c7331de3c473f65dce0/dell/xps/15-9560/intel/default.nix (also check out the imports from this file).

Either you can take the commands from that configuration, or maybe there is already something ready to use for your P43s…

But your GPU/Intel architecure is Turing/Coffee Lake or newer, right?

Hi! Thank you for your reply. I didn’t know about that nixos-hardware repo! Very useful-looking.

The only thing that looked different to all the things i’d tried (e.g. manually using bbswitch, etc.) was the hardware.nvidiaOptimus.disable = lib.mkDefault true;, but unfortunately that also didn’t work. FWIW i look at lspci and see (rev a1) by the GPU, i expect (rev ff) if it’s working, i believe.

But your GPU/Intel architecure is Turing/Coffee Lake or newer, right?

I think that’s the problem. I have i7-8565U CPU and lspci says 3c:00.0 3D controller: NVIDIA Corporation GP108GLM [Quadro P520] (rev a1). I haven’t found clear info but i suspect the graphics chip is “too old”.

Surely there must be a way to hard-disable it – at this point i’d physically remove it if i could.

I wonder if something like https://major.io/p/disable-nvidia-gpu-thinkpad-t490/ might work?

For reference, i see on https://www.notebookcheck.net/Lenovo-ThinkPad-P43s-laptop-review-The-mobile-workstation-s-display-and-performance-disappoint.439972.0.html (under the heading “Energy consumption”) that the system idles around 5W in Windows, while for me it’s usually hovering at 7.5-8W when i’m doing nothing at all. This tells me that somehow in Windows the driver is probably able to shut off the GPU completely, because anecdotally i’ve seen folks claim about a 2.5W power usage decrease when a similar nVidia GPU is powered down vs in lowest P-state. Surely it must be possible then?

OK, so i take that back (apologies for the back-and-forth). Even though

$ cat /proc/acpi/bbswitch
0000:3c:00.0 ON

and lspci still say [Quadro P520] (rev a1), it does seem as if the power usage drops lower than it has in the past. If i leave it to settle down, it seems to get as low as about 5.5-6.5W, so that’s already a big win.