Tracking down Nvidia GPU Utilization Issue

Good morning/evening/{time-of-day}!

I recently started to try to switch over to flakes, and in the process noted that my system was switched over to unstable. While most of the system is actually working somewhat better (as a consequence of more up to date software I’d assume), I’ve encountered a problem with games and my Nvidia GPU utilization. As nothing changed about my packages aside from the channel they’re being pulled from (and the hyprland flake I’m using seems to be the same as Unstable) I’m fairly sure there’s some kind of bug going on, but I’m not really sure how to track it down to report to the right people.

To that end, I’m looking for advice and/or directions on how I can track down (or learn on my own to track down) the issue so I can report it to either the NixOS bug tracker or another relevant tracker.

Some basic information to start:

lspci | grep VGA >>

0000:00:02.0 VGA compatible controller: Intel Corporation Alder Lake-P Integrated Graphics Controller (rev 0c)
0000:01:00.0 VGA compatible controller: NVIDIA Corporation GA104M [Geforce RTX 3070 Ti Laptop GPU] (rev a1)

The problem: having moved to unstable in moving to flakes, my performance in games and video accelerated applications has taken a serious nosedive. I normally get ~70-80 FPS on Ultra settings with Raytracing in games (Control, Valheim, Satisfactory, etc etc etc). After updating into unstable I average 15-20 FPS at the same settings. At the same time GPU utilization reports via nvidia-smi and nvidia-settings both as 100% at ALL times. As said, no changes were made to my configuration other than setting up flakes with unstable. The entirety of my current /etc/nixos/flake.nix:

{
description = “Zeta’s base NixOS Flake”;

inputs = {
# Official NixOS package source, using nixos-unstable branch here
nixpkgs.url = “github:NixOS/nixpkgs/nixos-unstable”;
#nixpkgs.url = “github:NixOS/nixpkgs/nixos-23.05”;

# home-manager, used for managing user configuration
#home-manager = {
  #url = "github:nix-community/home-manager/release-23.05";
  # The `follows` keyword in inputs is used for inheritance.
  # Here, `inputs.nixpkgs` of home-manager is kept consistent with
  # the `inputs.nixpkgs` of the current flake,
  # to avoid problems caused by different versions of nixpkgs.
  #inputs.nixpkgs.follows = "nixpkgs";
#};

hyprland.url = "github:hyprwm/Hyprland";
xdg-desktop-portal-hyprland.url = "github:hyprwm/xdg-desktop-portal-hyprland";

};

outputs = { self, nixpkgs, hyprland, … }@inputs: {
nixosConfigurations = {
“nixos” = nixpkgs.lib.nixosSystem {
system = “x86_64-linux”;

    specialArgs = inputs;

    modules = [
      # Import the configuration.nix here, so that the
      # old configuration file can still take effect.
      # Note: configuration.nix itself is also a Nix Module,
      ./configuration.nix
      hyprland.nixosModules.default
      {programs.hyprland.enable = true;}
    ];
  };
};

};
}

I don’t even have home-manager yet, since I’ve been trying to go slowly learning NixOS. All other settings have been kept the same.

From /etc/nixos/configuration.nix:

OpenGL settings

hardware.opengl = {
enable = true;
driSupport = true;
driSupport32Bit = true;
extraPackages = with pkgs; [
vaapiIntel
nvidia-vaapi-driver
vaapiVdpau
libvdpau-va-gl
intel-media-driver
];
};

Nvidia settings

services.xserver.videoDrivers = [ “nvidia” ];
hardware.nvidia = {
modesetting.enable = true;
open = true;
nvidiaSettings = true;
package = config.boot.kernelPackages.nvidiaPackages.vulkan_beta;
};

boot.kernelParams = [ “module_blacklist=i915” ]; #blacklist integrated GPU

boot.blacklistedKernelModules = [ “nouveau” ];
boot.extraModprobeConfig = “options nvidia NVreg_RegistryDwords="PowerMizerEnable=0x1; PerfLevelSrc=0x2222; PowerMizerLevel=0x3; PowerMizerDefault=0x3; PowerMizerDefaultAC=0x3;"\n”;

I can confirm that changing most of these settings does not solve the problem. Open vs Non-Open drivers, package set to stable vs vulkan_beta, modesetting, etc, have all been tested.

The extraModprobeConfig options are used for making sure Hyprland doesn’t have ugly black artifacting; removing those options doesn’t appear to make a difference. Additionally I tried a workaround for passing Vulkan variables into Steam that I found on the Discourse from a while back but that also doesn’t seem to have made a difference.

Having taken a look at journalctl --boot, the only things I can find that seems to be related to Nvidia having an error are:

Aug 28 18:06:57 nixos (udev-worker)[764]: nvidia: Process ‘/nix/store/r4vxljid3iq94jp7qvd639sps0fscwy3-bash-5.2-p15/bin/bash -c ‘mknod -m 666 /dev/nvidiactl c $(grep nvidia-frontend /proc/devices | cut -d \ -f 1) 255’’ failed with exit code 1.

Aug 28 18:06:58 nixos (udev-worker)[764]: nvidia: Process ‘/nix/store/r4vxljid3iq94jp7qvd639sps0fscwy3-bash-5.2-p15/bin/bash -c ‘for i in $(cat /proc/driver/nvidia/gpus/*/information | grep Minor | cut -d \ -f 4); do mknod -m 666 /dev/nvidia${i} c $(grep nvidia-frontend /proc/devices | cut -d \ -f 1) ${i}; done’’ failed with exit code 1.

Finally I did notice that nvidia-settings, in the new version, does give me more information than it used to. Specifically it lists in “Graphics Information” the subheading “EGL”, with information that looks like it’s related to Mesa. (I find this odd given that Mesa is Intel? Stable channel 23.05 nvidia-settings lists nothing in this tab at all, though I suppose that’s not unexpected given it’s Wayland.)

I’m not really looking for anybody to do the work for me, but I’d appreciate some pointers to figure out why it’s happening. At very least I’d like to be able to report the issue before I have to figure out how to get my system to move back to a mix of Stable and Unstable, even if I can’t fix it directly.

Please and thank you in advance? :slight_smile:

1 Like

Just updated my system on stable, I’m seeing ~40-50 fps when I was at 165 stable just yesterday, also with apparently 100% GPU utilization.

I’m not 100% sure if this is a regression or proton still working on shader compilation, but usually that’s finished much quicker. The only notable change in today’s update was to linux 6.11. I suppose I’ll wait a bit longer before chasing down this rabbit hole.

Ignore that, seems I just forgot to un-stash a little hack to make my monitor actually run at 165 fps, need to find a nice way to do that and actually commit it so this doesn’t happen :wink:

The nvidia-settings info was apparently fixed in NixOS using a patch (I was not aware of this either), your move to flakes probably coincided with your first update in 2 weeks: linuxPackages.nvidia_x11.settings: fix wayland support · NixOS/nixpkgs@9b1154c · GitHub

It lists EGL extensions for me, which are quite relevant on wayland. What is “mesa related” in your eyes?

And like that, my hopes are dashed! Jokes aside, thank you for responding.

To be honest, Mesa doesn’t mean much to me at all. Much as I’d love to be, I’m not well educated on that field. I’m mostly just going off of people on the internet telling me it’s Intel related whenever I’ve been told to install additional mesa packages in other distros. I had something very similar happen once on EndeavourOS which was solved by installing a bunch of missing libraries and drivers from Mesa, but that doesn’t seem to be the same situation here.

That commit two weeks ago also seems to be when the Nvidia driver version was updated. Figured I’d try rolling back the driver to see if it’s the update that’s causing it for some reason and I’ve been searching around for a way to selectively roll back a package, but pinning the Nvidia driver seems to be harder than people anticipated given the number of topics I found that were solved by other fixes before they could figure it out.

The closest I’ve found to an answer seems to be this from StackOverflow. I attempted a modified version of what’s suggested here with:


 let
  nixos-stable-2305 = import (builtins.fetchTarball {
    url = https://github.com/NixOS/nixpkgs/archive/refs/tags/23.05.tar.gz;
    sha256 = "10wn0l08j9lgqcw8177nh2ljrnxdrpri7bp0g7nvrsn9rkawvlbf";
  }) { };

  # We'll use this twice
  pinnedKernelPackages = nixos-stable-2305.linuxPackages_latest;
in
nixpkgs.config.packageOverrides = pkgs: {
    linuxPackages_latest = pinnedKernelPackages;
    nvidia_x11 = nixos-stable-2305.nvidia_x11;
  };
  boot.kernelPackages = pinnedKernelPackages;

Buuut that isn’t really working; spits out syntax errors to high heaven. Given how old it is I’m not surprised, but I don’t know Nix well enough yet to really know what else to try. Trying to figure it out now, just… slow going.

Since NixOS 19.03 hardware.nvidia.package exists. I read through the source, as best as I can tell it should work to change the nvidia driver version:

hardware.nvidia.package = let
  nixos-stable-2305 = import (builtins.fetchTarball {
    url = https://github.com/NixOS/nixpkgs/archive/refs/tags/23.05.tar.gz;
    sha256 = "10wn0l08j9lgqcw8177nh2ljrnxdrpri7bp0g7nvrsn9rkawvlbf";
  }) { };
in
  # Or whatever kernel you use
  nixos-stable-2305.linuxKernel.linux_6_4.nvidia_x11;

The caveat is that it may be built for the wrong kernel; this particular bit of the NixOS module system is a bit complex. Perhaps you’d need something like:

hardware.nvidia.package = let
  nixos-stable-2305 = import (builtins.fetchTarball {
    url = https://github.com/NixOS/nixpkgs/archive/refs/tags/23.05.tar.gz;
    sha256 = "10wn0l08j9lgqcw8177nh2ljrnxdrpri7bp0g7nvrsn9rkawvlbf";
  }) { };
in
  nixos-stable-2305.linuxKernel.linux_6_4.nvidia_x11.override { 
    kernel = config.boot.kernelPackages;
  };

Not sure if config.boot.kernelPackages points to the correct derivation. You may need to get its equivalent linuxKernel.kernels.* instead.

Of course, you can also just replace the kernel fully, which is likely easiest:

boot.kernelPackages = let
  nixos-stable-2305 = import (builtins.fetchTarball {
    url = https://github.com/NixOS/nixpkgs/archive/refs/tags/23.05.tar.gz;
    sha256 = "10wn0l08j9lgqcw8177nh2ljrnxdrpri7bp0g7nvrsn9rkawvlbf";
  }) { };
in
  # At least, this is the default according to the NixOS module description.
  # Since these attributes are hidden from the package search, I have no
  # idea what actually works here. *Really* hate the package hiding feature.
  #
  # Replace with `pkgs.linuxKernel.packages.<version>` or `pkgs.linux_latest`
  # as required.
  nixos-stable-2305.linuxPackages;