How to use Nvidia PRIME offload to run the X server on the integrated board

Denommus · September 21, 2020, 2:57pm

My video-related configuration is like the following:

  hardware.opengl.driSupport32Bit = true;
  hardware.opengl.enable = true;
  hardware.opengl.extraPackages = with pkgs; [
    intel-media-driver
    vaapiIntel
    vaapiVdpau
    libvdpau-va-gl
  ];
  hardware.opengl.extraPackages32 = with pkgs.pkgsi686Linux; [
    libva
    vaapiIntel
  ];
  hardware.nvidia.prime = {
    offload.enable = true;
    nvidiaBusId = "PCI:1:0:0";
    intelBusId = "PCI:0:2:0";
  };
  hardware.nvidia.modesetting.enable = true;
  #hardware.nvidia.nvidiaPersistenced = true;

  services.xserver.videoDrivers = [ "nvidia" ];
  services.xserver.dpi = 96;

Everything seems to be running well, and I can use the nvidia-offload script to run stuff in my nvidia card. But when I check nvidia-smi, I notice that my X server is running on my nvidia card, when I expected it to run on my Intel card, since I’m not offloading it.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.66       Driver Version: 450.66       CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 960M    Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   35C    P8    N/A /  N/A |      4MiB /  4046MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     13441      G   ...-xorg-server-1.20.8/bin/X        3MiB |
+-----------------------------------------------------------------------------+

What am I doing wrong?

boxofrox · September 22, 2020, 6:02pm

I’ve no familiarity with how prime or nvidia-smi works. But I’ve got a couple semi-intelligent/dumb questions that might lead you to a solution.

Perhaps this tells xserver to use the “nvidia” driver? Can you select “intel” instead, and does that give you the expected result?

Also, I’m not sure what resolution you’re working with, but 3MiB of GPU memory usage seems a bit light if, for example, you’re using 24-bit color depth at 1920x1080 which would need ~~~47 MiB (24 * 1920 * 1080 / 1024 / 1024)~~ ~6MiB (3 * 1920 * 1080 / 1024 / 1024) for the frame buffer.

Maybe “GPU Memory Usage” doesn’t report frame buffer memory and xorg is running on the nvidia gpu?

Or maybe xorg is running primarily on the intel gpu, but still needs to interface with the nvidia gpu to query or set gpu state, hence the 3MiB. I’d honestly expect prime to synchronize the two GPUs transparently behind the scenes, but since I don’t know, it’s a possibility to consider.

Samae · September 26, 2020, 1:03pm

I’ve recently setup prime the way OP described, but I first (mistakenly) set videoDrivers = [ "modesetting" "nvidia" ]; This lead to offloading not working, and nvidia-smi did not report anything using the card. I then removed modesetting, and restarted X. After this, I could report the same as OP, nvidia-smi now reports X is using ~4MB of memory, and gpu offloading works correctly, using the nvidia-offload script.

I’m assuming that @boxofrox’ hypothesis is correct: “xorg is running primarily on the intel gpu, but still needs to interface with the nvidia gpu to query or set gpu state, hence the 3MiB.”

boxofrox · September 26, 2020, 5:21pm

Actually, I fubar’d that calculation by using 24-bits instead of 3-bytes. 3 * 1920 * 1080 / 1024 / 1024 = 6MiB. So still twice as much as reported, but not as drastic a difference.

Samae · September 27, 2020, 6:57am

My laptop actually drives two displays, one at 1920x1080, and an external ultra-wide, probably at 2560 x 1080. Using the formula you mentioned earlier, this should get me at ~ 14MiB. This may reinforce the idea that the memory size reported by nvidia-smi is not directly related to the size of the bitmap to render.

Actually, this thread suggests that the resource usage you see listed by nvidia-smi may be caused by… nvidia-smi itself, even if the process is called X:

Yeah, you’re correct that nvidia-smi turns the GPU on, so it’s not useful to check for runtime power management. Instead, you can check the kernel’s power status file /sys/bus/pci/devices/<busid>/power/runtime_status . If it says suspended then the kernel put the GPU to sleep.

It’s not available quite yet, but in a future release there will also be a /proc/driver/nvidia/gpus/<busid>/power file you can read for additional information (i.e. to tell whether the VRAM is powered off or in self-refresh mode).

@Denommus you’re probably doing nothing wrong

boxofrox · September 27, 2020, 5:25pm

@Samae & @Denommus with nothing running that offloads to the GPU, does cat /sys/bus/pci/devices/<busid>/power/runtime_status report suspended?

If so, then I think that confirms @Samae is correct and SMI is causing the resource usage seen.

Samae · September 28, 2020, 1:05pm

Hmm, so I’m not getting the results I expected:

$ cat /sys/bus/pci/devices/0000:01:00.0/power/runtime_status
active

$ cat /proc/driver/nvidia/gpus/0000:01:00.0/power
Runtime D3 status:          Disabled
Video Memory:               Active

GPU Hardware Support:
 Video Memory Self Refresh: Supported
 Video Memory Off:          Supported

Here’s what nvidia-smi says:

$ nvidia-smi
Mon Sep 28 16:04:28 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.66       Driver Version: 450.66       CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro T1000        Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   47C    P8     2W /  N/A |      5MiB /  3914MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1552      G   ...-xorg-server-1.20.8/bin/X        4MiB |
+-----------------------------------------------------------------------------+

NB: contrary to @Denommus, I have not enabled modesetting with nvidia:
hardware.nvidia.modesetting.enable = false

boxofrox · September 28, 2020, 1:39pm

Well, that’s disappointing. Here are the last two ideas I have for gaining a better understanding of what’s going on.

Disable PRIME offload, run the X server on the NVIDIA GPU, then run SMI and compare that report against the enabled-PRIME-offload SMI reports we’ve seen.

If the reports match, then that would suggest X does run on the NVIDIA GPU and not the integrated GPU.

If the disabled-PRIME-offload SMI report uses more memory for X, that would suggest X may be running on the integrated GPU, but also managing the NVIDIA GPU in some fashion.
Stop the X server, log in via linux console, and see if cat /sys/bus/pci/devices/0000:01:00.0/power/runtime_status actually reports suspended. If it still doesn’t, then I wonder if the NVIDIA drivers on Linux even bother with power management.

jorsn · September 28, 2020, 8:11pm

Did you try glxinfo?

$ glxinfo | grep 'OpenGL render'
OpenGL renderer string: Mesa DRI Intel(R) Haswell Mobile 
$ nvidia-offload glxinfo | grep 'OpenGL render'
OpenGL renderer string: GeForce GT 740M/PCIe/SSE2

This shows whether in principle both cards can be used, depending on the environment (nvidia-offloads does nothing than setting environment variables).

Samae · September 29, 2020, 1:08pm

For ref, I’ve got this:

$ glxinfo | grep 'OpenGL render'
OpenGL renderer string: Mesa Intel(R) UHD Graphics (CML GT2)
$ nvidia-offload glxinfo | grep 'OpenGL render'
OpenGL renderer string: Quadro T1000/PCIe/SSE2

So, offload seems to be working properly here.

jorsn · September 29, 2020, 2:04pm

[…] then I wonder if the NVIDIA drivers on Linux even bother with power management.

https://download.nvidia.com/XFree86/Linux-x86_64/435.21/README/dynamicpowermanagement.html

Denommus · October 8, 2020, 2:29pm

I’m sorry for leaving this thread without further reporting, I was busy with family matters the past few weeks.

So, cat /sys/bus/pci/devices/0000:01:00.0/power/runtime_status reports “active” even before I have used any offloading.

Using “modesetting” in the driver list breaks completely nvidia-offload.

Finally, glxinfo does report correctly that I’m using Intel when not offloading, and Nvidia when offloading. But, again, the nvidia board seems to always be active.

jorsn · October 13, 2020, 3:53pm

Did you activate power management as in the nvidia link above? I cannot test it because it doesn’t work on kepler cards. Btw., I can see whether the nvidia card is on from the power consumption.

The brute force hack is like in arch’s nividia-xrun:

Use an X session with only the integrated card, unload all nvidia kernel modules and

$ echo 1 > /sys/bus/pci/devices/0000\:08\:00.0/remove

(with your bus id).

When an application shall use the nvidia card

$ echo 1 > /sys/bus/pci/rescan

then load the drivers and open a second X session using the nvidia card. Don’t try to remove the device as long as the nvidia drivers are loaded, and don’t try to unload the nvidia drivers as long as the card is used!

I’ve put this into a script which is a server running as root which does this part and communicates via a FIFO. The missing bit so far is only flushing the fifo without blocking. If anyone knows how to do this cleanly with FIFOs or sockets in bash (e.g. using socat), please tell me.

Denommus · October 13, 2020, 4:19pm

This link is helpful, and after reading into that I’ve seen in NixOS options there is one option to enable power management, hardware.nvidia.powerManagement.enable. I’ll test and see if this solves my doubt.

If it doesn’t work, how would I set those options in my configuration.nix?

Denommus · October 13, 2020, 4:49pm

Ok, I’ve added:

  boot.extraModprobeConfig = "options nvidia \"NVreg_DynamicPowerManagement=0x02\"\n";
  services.udev.extraRules = ''
  # Remove NVIDIA USB xHCI Host Controller devices, if present
  ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x0c0330", ATTR{remove}="1"
  
  # Remove NVIDIA USB Type-C UCSI devices, if present
  ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x0c8000", ATTR{remove}="1"
  
  # Remove NVIDIA Audio devices, if present
  ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x040300", ATTR{remove}="1"
  
  # Enable runtime PM for NVIDIA VGA/3D controller devices on driver bind
  ACTION=="bind", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x030000", TEST=="power/control", ATTR{power/control}="auto"
  ACTION=="bind", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x030200", TEST=="power/control", ATTR{power/control}="auto"
  
  # Disable runtime PM for NVIDIA VGA/3D controller devices on driver unbind
  ACTION=="unbind", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x030000", TEST=="power/control", ATTR{power/control}="on"
  ACTION=="unbind", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x030200", TEST=="power/control", ATTR{power/control}="on"
  '';

to my configuration.nix

That has changed /sys/bus/pci/devices/0000:01:00.0/power/control to auto, as expected. But when I cat /sys/bus/pci/devices/0000:01:00.0/power/runtime_status, I still get active.

I’m not sure whether that solution actually turns off the GPU, or if it just keeps it in a low-powered state. I’ll give it a test in the next few days.

jorsn · October 13, 2020, 5:11pm

What does cat /proc/driver/nvidia/gpus/*/power say?

eadwu · October 13, 2020, 5:50pm

The GPU cannot be powered off, I believe this is due to a limitation of Xorg and not a limitation of the hardware support for Turing. Turing allowed for a better low power mode using RTD3.

Denommus · October 13, 2020, 6:34pm

Runtime D3 status: Not supported
Video Memory: Active

GPU Hardware Support:
Video Memory Self Refresh: Not Supported
Video Memory Off: Not Supported

jorsn · October 13, 2020, 8:53pm

@eadwu According to the page I cited above, the GPU can be turned off in the supported configurations:

However, the lowest power states for NVIDIA GPUs require turning power off to the entire chip, often through ACPI calls.
[…]

Supported Configurations

[…]

[…] The necessary hardware and ACPI support was first added in Intel Coffeelake chipset series. Hence, this feature is supported from Intel Coffeelake chipset series.

This feature requires a Turing or newer GPU.

This feature is supported with Linux kernel versions 4.18 and newer. […]

[…]

@Denommus However, your card (GeForce GTX 960M) is of the Maxwell generation, which is 2 generations older than Turing. Thererfore ”Runtime D3 [management is] Not supported“. You’re in the same situation as I am. So you can

either live without turning off the nvidia card,
or use nouveau which has worse performance than my intel card on my notebook,
or switch on an X session basis (which can also be embedded into a running X session, e.g. with Bumblebee. However, bumblebee has no recent release, so still using it might be a bit experimental. Also, bbswitch does not work anymore. The arch wiki contains information about the different methods for turning off nividia gpus.

My post above covers case 3., but I will only configure the two different X sessions in NixOS when release 20.09 is out, because there the mechanism for cloning configurations will change (look for nesting.clone/specialization).

moritzschaefer · October 15, 2020, 10:58pm

I was wondering for months, why my nvidia-offload works but the graphics card doesn’t completely switch off. So, thank you for all the investigation that happened within in this thread!

This is especially annoying in my case since the fans of my notebook (XPS 9560) become really noisy when graphics card is not switched off. For me, it is kind of hard to believe that Intel+Nvidia, after so many years of producing laptops with dual graphics, only managed since Turing/Coffee Lake to support a complete shutdown of the graphics card.

I added a reference and explanation to your (@jorsn) last post to the Nvidia nixos wiki page.