NixOS 20.09: Nvidia offloading shows same performance as non-offloading and is slower than Nvidia only

Hi all,

I have been running my Lenovo T460p with nvidia.prime.sync.enable = true until recently, since the offload mode was announced for the NixOS 20.09 release.

About a week ago, I switched to nvidia.prime.offload.enable = true and removed the sync setting.
I run NixOS 20.09 stable, with Gnome and a couple of graphics programs from the unstable channel.
I currently run sddm, since gdm only shows a black screen after boot, but that setup worked with Nvidia sync mode before the switch.

As expected, graphics performance dropped when running on the intel GPU, but does not improve, when running with the nvidia-offload script.

I would be very grateful, if anyone has experience with such an issue, or knows how to debug it further, or knows links to forum threads I did not find yet, or even knows how to make it work.

NixOS discourse links that may be related

What follows are the relevant parts of the configuration files and debug output from programs that I gathered from several sources.

Relevant parts from configuration.nix:

{ config, pkgs, ... }:
let
  nvidia-offload = pkgs.writeShellScriptBin "nvidia-offload" ''
    # https://download.nvidia.com/XFree86/Linux-x86_64/440.64/README/primerenderoffload.html
    export __NV_PRIME_RENDER_OFFLOAD=1
    export __NV_PRIME_RENDER_OFFLOAD_PROVIDER=NVIDIA-G0
    export __GLX_VENDOR_LIBRARY_NAME=nvidia
    export __VK_LAYER_NV_optimus=NVIDIA_only
    export VK_ICD_FILENAMES=${pkgs.linuxPackages.nvidia_x11}/share/vulkan/icd.d/nvidia.json
    exec -a "$0" "$@"
  '';
in
{
  nixpkgs.config = {
    # required for nvidia driver
    allowUnfree = true;
  }
  hardware = {
    # Nvidia PRIME
    # https://nixos.wiki/wiki/Nvidia
    nvidia.prime = {
      #modesetting.enable = true; # prevent tearing
      # 3D card always enabled
      #sync.enable = true;
      # offload 3D to graphics with nvidia-offload script
      offload.enable=true;
      intelBusId = "PCI:0:2:0";
      nvidiaBusId = "PCI:2:0:0";
    };

    opengl = {
      # required for running Steam
      driSupport32Bit = true;
    };
  };

  environment.systemPackages = with pkgs; [
    # nvidia offload script
    nvidia-offload
    ...
  };

  services.xserver = {
    enable = true;
    videoDrivers = [ "nvidia"  ];
    displayManager.sddm.enable = true;
    desktopManager.gnome3.enable = true;
  };

  system.stateVersion = "20.09";
  system.autoUpgrade.enable = true;
  system.autoUpgrade.channel = https://nixos.org/channels/nixos-20.09;
}

glxinfo -B

name of display: :0
display: :0  screen: 0
direct rendering: Yes
Extended renderer info (GLX_MESA_query_renderer):
    Vendor: Intel (0x8086)
    Device: Mesa Intel(R) HD Graphics 530 (SKL GT2) (0x191b)
    Version: 20.1.7
    Accelerated: yes
    Video memory: 3072MB
    Unified memory: yes
    Preferred profile: core (0x1)
    Max core profile version: 4.6
    Max compat profile version: 4.6
    Max GLES1 profile version: 1.1
    Max GLES[23] profile version: 3.2
OpenGL vendor string: Intel
OpenGL renderer string: Mesa Intel(R) HD Graphics 530 (SKL GT2)
OpenGL core profile version string: 4.6 (Core Profile) Mesa 20.1.7
OpenGL core profile shading language version string: 4.60
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile

OpenGL version string: 4.6 (Compatibility Profile) Mesa 20.1.7
OpenGL shading language version string: 4.60
OpenGL context flags: (none)
OpenGL profile mask: compatibility profile

OpenGL ES profile version string: OpenGL ES 3.2 Mesa 20.1.7
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20

nvidia-offload glxinfo -B

name of display: :0
display: :0  screen: 0
direct rendering: Yes
Memory info (GL_NVX_gpu_memory_info):
    Dedicated video memory: 2048 MB
    Total available memory: 2048 MB
    Currently available dedicated video memory: 1998 MB
OpenGL vendor string: NVIDIA Corporation
OpenGL renderer string: GeForce 940MX/PCIe/SSE2
OpenGL core profile version string: 4.6.0 NVIDIA 455.38
OpenGL core profile shading language version string: 4.60 NVIDIA
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile

OpenGL version string: 4.6.0 NVIDIA 455.38
OpenGL shading language version string: 4.60 NVIDIA
OpenGL context flags: (none)
OpenGL profile mask: (none)

OpenGL ES profile version string: OpenGL ES 3.2 NVIDIA 455.38
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20

xrandr --listproviders

Providers: number : 2
Provider 0: id: 0x49 cap: 0xf, Source Output, Sink Output, Source Offload, Sink Offload crtcs: 3 outputs: 7 associated providers: 0 name:modesetting
Provider 1: id: 0x24d cap: 0x0 crtcs: 0 outputs: 0 associated providers: 0 name:NVIDIA-G0

glxgears

Running synchronized to the vertical refresh.  The framerate should be
approximately the same as the monitor refresh rate.
380 frames in 5.0 seconds = 75.963 FPS
300 frames in 5.0 seconds = 59.999 FPS
300 frames in 5.0 seconds = 59.996 FPS

nvidia-offload glxgears

Running synchronized to the vertical refresh.  The framerate should be
approximately the same as the monitor refresh rate.
304 frames in 5.0 seconds = 60.788 FPS
300 frames in 5.0 seconds = 59.998 FPS
301 frames in 5.0 seconds = 59.998 FPS

I know, that glxgears is not the best measurement tool, but with sync mode turned on, I got around 6000 FPS with glxgears (don’t have the exact numbers around right now).

Maybe it’s an issue with measurement. glxgears say that The framerate should be approximately the same as the monitor refresh rate. For me, it look like offload made the refresh rate of glxgears dependant on the screen frame rate (and that should be better). Maybe try with an application that can’t run at full speed on your computer, and is GPU bound. (otherwise, the config looks good)

@marius851000, thanks for your response.

I did try to run e.g. Steam prefixed with the nvidia-offload script, and the video acceleration was very poor.

Today, I also got the chance to switch back to prime.sync.enable = true and turning off offload mode.

The output of glxgears looks like the following

Running synchronized to the vertical refresh.  The framerate should be
approximately the same as the monitor refresh rate.
35297 frames in 5.0 seconds = 7058.395 FPS
35980 frames in 5.0 seconds = 7195.868 FPS
36243 frames in 5.0 seconds = 7248.592 FPS
36429 frames in 5.0 seconds = 7284.450 FPS

xrandr --listproviders

Providers: number : 2
Provider 0: id: 0x1b8 cap: 0x1, Source Output crtcs: 0 outputs: 0 associated providers: 1 name:NVIDIA-0
Provider 1: id: 0x1e2 cap: 0x2, Sink Output crtcs: 3 outputs: 7 associated providers: 1 name:modesetting

This looks to me, as if offload mode is either not configured correctly, or that it does not work correctly (maybe my graphics card is too old?).