NVIDIA GPU not used by Steam

Hi, I’ve been banging my head against a wall for a while now, and I don’t know who to contact, so I’m trying these forums since I’m on NixOS.

I have a hybrid GPU laptop (a System76 Serval WS) with an i9 14900 HX and an RTX 4070. Steam games are not using my NVIDIA card (which I’ll call the dGPU). I have an incredibly poor internet connection so I can’t download games reliably, so I’m testing Animal Well and Quake II RTX because they are both small-ish and GPU-heavy.

  • Both vkcube and (using nvidia-offload) glxgears run successfully using the dGPU.
  • Running Animal Well via nvidia-offload wine64 Animal\ Well.exe works, and it runs on the dGPU
  • It’s not a hardware problem. I installed Windows and tried the same games with the latest drivers and things worked without a hitch.
  • I’ve had this problem on other distros too. Pop! didn’t work from a fresh install either.
  • All Steam games I tested with crash with a Vulkan or DX12 initialization error.
  • Adding nvidia-offload %command% to the launch options changes nothing.
  • Proton Hotfix, Experimental, and 9 all don’t work, and neither does the Steam Linux Runtime (for Quake II RTX).

My current working theory is that there is either a bug in the drivers or a bug in Steam. I know that because of that, this forum may not be the best place to put my issue, but I figure that I’m running NixOS so it’s a starting point.

I can post my entire configuration.nix if requested, but I would imagine these are the important parts:

# [...]
  programs.steam = {
    enable = true;
    remotePlay.openFirewall = false;
    dedicatedServer.openFirewall = false;
    localNetworkGameTransfers.openFirewall = true;
  };
# [...]
  hardware.graphics = {
    enable = true;
    enable32Bit = true;
  };

  services.xserver.videoDrivers = [ "nvidia" ];

  hardware.nvidia = {
    modesetting.enable = true;
    powerManagement.enable = true;
    powerManagement.finegrained = true;
    open = true;
    nvidiaSettings = true;
    package = config.boot.kernelPackages.nvidiaPackages.stable;
    prime = {
      offload = {
        enable = true;
        enableOffloadCmd = true;
      };
      intelBusId = "PCI:0:2:0";
      nvidiaBusId = "PCI:1:0:0";
    };
  };
# [...]

Looking for help.

I don’t think it’s a bug. What the nvidia-offload script does is set some environment variables. Using it inside the steam command override probably doesn’t work because steam is in a special sandbox, it likely just fails to run the script correctly.

Have you tried starting steam itself with nvidia-offload? That should propagate the variables correctly for steam games to then pick them up.

If you assert that works, we can either make it permanent by editing the steam desktop file, or set the environment variables directly in all your steam command overrides if you really want to run steam on the iGPU (very reasonable desire if you’d like to use big picture mode).

Also, props, I think you’re the first person I’ve seen with a reasonable nvidia config. Note that finegrained probably doesn’t actually work at the moment though: Nvidia never suspends - #2 by TLATER

I’ve tried that before, and it doesn’t work. I just tried it again to be sure (nvidia-offload steam -console) to no avail.

(also, about the NVIDIA config, I’ve only been using Nix for a few days at most, so it’s mostly just ripped from the wiki :P)

Hm, interesting! I assume you’re using nvidia-smi to check which applications end up on the GPU?

Is there anything interesting in the steam logs, dmesg or journalctl --user?

I hadn’t tried using nvidia-smi up until now, but nothing new;

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      2074      G   ...fs-gnome-shell-47.2/bin/gnome-shell          2MiB |
|    0   N/A  N/A      3791    C+G   ....local/share/Steam/logs/cef_log.txt          6MiB |
|    0   N/A  N/A      5031    C+G   Animal Well.exe                                78MiB |
+-----------------------------------------------------------------------------------------+

(this is with Animal Well open via nixpkgs wine64, not through steam)

A few things of note from journalctl:

ERROR: ld.so: object '/home/addiment/.local/share/Steam/ubuntu12_64/gameoverlayrenderer.so' from LD_PRELOAD cannot be preloaded (wrong ELF class: ELFCLASS64): ignored.

and

pressure-vessel-wrap[5679]: W: Found more than one possible libdrm data directory from provider

Perhaps something to do with nix-ld?

That’s normal, steam tries to load the gameoverlay so for a bunch of architectures in sequence until it hits the correct one for your system.

I haven’t seen this, but that would be game-specific, and has nothing to do with where steam renders.

I’m intrigued, but can’t provide much further support. Personally I’d double check with nvidia-smi if anything ends up on the nvidia GPU if I start steam with the env variables set.

My best guess is that this is a regression in some of the launch scripts, maybe one of them overrides the env vars. Just gotta figure out a way to hook into all of them and see their environments.

Thanks for the help. I still think it’s an upstream bug, most likely with steam. Running Quake II RTX gives me a specific point-of-failure. Running VK_LOADER_DEBUG=all nvidia-offload steam-run ./q2rtx (using the native version, not the Proton version) gives the following error in the console:

Picked physical device 0: NVIDIA GeForce RTX 4070 Laptop GPU
Using VK_KHR_ray_query
FP16 support: yes
]INFO | LAYER:      Failed to find vkGetDeviceProcAddr in layer "/nix/store/l0y1galra3xwk7dwkxjdipmzg4zga75k-mesa-24.2.8-drivers/lib/libVkLayer_MESA_device_select.so"
INFO | LAYER:      Inserted device layer "VK_LAYER_NV_optimus" (/nix/store/kw83icx0xvxypsaps3xy0xbvci9a23xz-nvidia-x11-565.77-6.6.82/lib/libGLX_nvidia.so.0)
DRIVER | LAYER:    vkCreateDevice layer callstack setup to:
DRIVER | LAYER:       <Application>
DRIVER | LAYER:         ||
DRIVER | LAYER:       <Loader>
DRIVER | LAYER:         ||
LAYER:                VK_LAYER_NV_optimus
LAYER:                        Type: Implicit
LAYER:                            Disable Env Var:  DISABLE_LAYER_NV_OPTIMUS_1
LAYER:                        Manifest: /run/opengl-driver/share/vulkan/implicit_layer.d/nvidia_layers.json
LAYER:                        Library:  /nix/store/kw83icx0xvxypsaps3xy0xbvci9a23xz-nvidia-x11-565.77-6.6.82/lib/libGLX_nvidia.so.0
LAYER:                  ||
DRIVER | LAYER:       <Device>
DRIVER | LAYER:           Using "NVIDIA GeForce RTX 4070 Laptop GPU" with driver: "/nix/store/kw83icx0xvxypsaps3xy0xbvci9a23xz-nvidia-x11-565.77-6.6.82/lib/libGLX_nvidia.so.0"
validation layer 1 4096: terminator_CreateDevice: Failed in ICD /nix/store/kw83icx0xvxypsaps3xy0xbvci9a23xz-nvidia-x11-565.77-6.6.82/lib/libGLX_nvidia.so.0 vkCreateDevice call
]Vulkan error: terminator_CreateDevice: Failed in ICD /nix/store/kw83icx0xvxypsaps3xy0xbvci9a23xz-nvidia-x11-565.77-6.6.82/lib/libGLX_nvidia.so.0 vkCreateDevice call
--- (null) 1

]ERROR | DRIVER:    terminator_CreateDevice: Failed in ICD /nix/store/kw83icx0xvxypsaps3xy0xbvci9a23xz-nvidia-x11-565.77-6.6.82/lib/libGLX_nvidia.so.0 vkCreateDevice call
validation layer 1 4096: vkCreateDevice:  Failed to create device chain.
]Vulkan error: vkCreateDevice:  Failed to create device chain.
--- (null) 1

]ERROR:             vkCreateDevice:  Failed to create device chain.
Closing console log.
********************
FATAL: Failed to create a Vulkan device.
Error code: VK_ERROR_INITIALIZATION_FAILED
********************

From the generic error messages, I’m almost certain something is wrong upstream.

Right, yep, I skimmed past the bit where you said the applications flat out crash. I’m not sure I would point at an upstream bug here, steam games work for most people AIUI.

The particular error message you get seems to turn up in a lot of driver-related issues in the proton repo, usually resolved by messing with installed packages on other distros. Not likely to affect you, of course, but it makes me wonder if the wrong set of shared libraries is being picked up for the GPU being rendered to. This would be a NixOS bug.

I spun up a fresh Pop!_OS install and confirmed that the issue is reproducible. I believe this is a bug with Steam, so I will contact Steam support or make an issue in Proton.