Nvidia Drivers Not loading

Hi everyone,

I have been trying to get my Nvidia card working, using offloading (as it seemed the simplest option).

I have the 4070 (laptop version).

I have pulled and put together all the configs from the Nvidia modules in the hardware repo and the items from the wiki. At the moment, I am not consuming the code from the hardware repo (my laptop model is not supported yet), and wanted all my Nvidia config in a single spot until I get this working.

Essentially, nvidia-smi nor nvidia-settings see the card. I searched this site, and the other fixes do not seem to apply.

Config:

{ lib, pkgs, config, ... }:

{
  # NVIDIA drivers are unfree.
  nixpkgs.config.allowUnfree = pkgs.lib.mkForce true;
  # Sets the default video driver for the X server and Wayland to "nvidia"
  services.xserver.videoDrivers = lib.mkDefault [ "nvidia" ];
  hardware = {
    opengl = {
      # Enables the graphics driver for OpenGL
      enable = true;
      # Enables Direct Rendering Infrastructure (DRI), which allows the graphics driver to directly render graphics, improving performance in OpenGL
      driSupport = true;
      # Enables 32-bit Direct Rendering Infrastructure (DRI) support, which allows the graphics driver to directly render graphics in 32-bit applications using OpenGL
      driSupport32Bit = true;
      # Adds the 'vaapiVdpau' package to the extra packages for OpenGL
      extraPackages = with pkgs; [ vaapiVdpau ];
    };
    nvidia = {
      # Modesetting is required.
      # Since NVIDIA does not support automatic KMS late loading, enabling DRM (Direct Rendering Manager) kernel mode setting is required to make Wayland compositors function properly, or to allow for Xorg#Rootless_Xorg.
      modesetting.enable = true;
      # Nvidia power management. Experimental, and can cause sleep/suspend to fail.
      # Enable this if you have graphical corruption issues or application crashes after waking
      # up from sleep. This fixes it by saving the entire VRAM memory to /tmp/ instead
      # of just the bare essentials.
      powerManagement.enable = false;
      # Fine-grained power management. Turns off GPU when not in use.
      # Experimental and only works on modern Nvidia GPUs (Turing or newer).
      powerManagement.finegrained = false;
      # Use the NVidia open source kernel module (not to be confused with the
      # independent third-party "nouveau" open source driver).
      # Support is limited to the Turing and later architectures. Full list of
      # supported GPUs is at:
      # https://github.com/NVIDIA/open-gpu-kernel-modules#compatible-gpus
      # Only available from driver 515.43.04+
      # Currently alpha-quality/buggy, so false is currently the recommended setting.
      ## Confirmed my card is in there.
      open = true;
      # Enable the Nvidia settings menu,
      # accessible via `nvidia-settings`.
      nvidiaSettings = true;
      # Optionally, you may need to select the appropriate driver version for your specific GPU.
      package = config.boot.kernelPackages.nvidiaPackages.stable;
      prime = {
        offload = {
          enable = true;
          enableOffloadCmd = true; # Provides `nvidia-offload` command.
        };
        # Bus ID of the Intel GPU. You can find it using lspci, either under 3D or VGA
        intelBusId = "PCI:0:2:0";
        # Bus ID of the NVIDIA GPU. You can find it using lspci, either under 3D or VGA
        nvidiaBusId = "PCI:1:0:0";
      };
    };
  };

}

#### NOTES
# Get BusID:
# lspci | rg "VGA|3D controller"
# it will be in hexadecimal format, convert it to decimal
# https://www.binaryhexconverter.com/hex-to-decimal-converter

Errors:

❯ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

Depending on the revision of the gpu it might not be supported by the driver on the stable branch, since that has been frozen since november 2023.

I recommend trying a newer driver using the custom driver by copying a driver from the nixos-unstable branch: https://github.com/NixOS/nixpkgs/issues/289292#issuecomment-1948782958

In addition you should take a look at your kernel logs for errors from the nvidia kernel module: journalctl -k -b --grep "nvidia"

Just for reference.

  1   │ Mar 06 13:41:45 evo kernel: Command line: initrd=\EFI\nixos\kb22xxyjki0hfa4xryhjw3iqwm1i5nx9-initrd-li
       │ nux-6.6.19-initrd.efi init=/nix/store/hja927fgi6r29dci1sl26a6mgyk273nw-nixos-system-evo-24.05.20240303
       │ .b8697e5/init loglevel=4 nvidia-drm.modeset=1 nvidia.NVreg_OpenRmEnableUnsupportedGpus=1
   2   │ Mar 06 13:41:45 evo kernel: Kernel command line: initrd=\EFI\nixos\kb22xxyjki0hfa4xryhjw3iqwm1i5nx9-in
       │ itrd-linux-6.6.19-initrd.efi init=/nix/store/hja927fgi6r29dci1sl26a6mgyk273nw-nixos-system-evo-24.05.2
       │ 0240303.b8697e5/init loglevel=4 nvidia-drm.modeset=1 nvidia.NVreg_OpenRmEnableUnsupportedGpus=1

I will check out the other link you posted and see if that helps.

OK, I was going down the road of the article, then I noticed it is for the “super”. Which I do not believe this 4070 is. (Since it is a laptop GPU).

I also noticed this time around that the wiki highlights:

Note: As of early March 2024 the production driver has been updated from 535 to 550. This is a breaking change for some people, especially those on Wayland. To resolve this follow the steps under Running the new RTX SUPER on nixos stable.

So I moved over to package = config.boot.kernelPackages.nvidiaPackages.production; but unfortunately it did not help.

So, after the change, I checked journalctl again:

Mar 06 17:38:22 evo kernel: Command line: initrd=\EFI\nixos\ycyfh7ddcajd7gav8rz8gnl7mdwzgf84-initrd-linux-6.6.19-initrd.efi init=/nix/store/payip885f0jd2lmzfavpq419ihcs17y9-nixos-system-evo-24.05.20240303.b8697e5/init loglevel=4 nvidia-drm.modeset=1
Mar 06 17:38:22 evo kernel: Kernel command line: initrd=\EFI\nixos\ycyfh7ddcajd7gav8rz8gnl7mdwzgf84-initrd-linux-6.6.19-initrd.efi init=/nix/store/payip885f0jd2lmzfavpq419ihcs17y9-nixos-system-evo-24.05.20240303.b8697e5/init loglevel=4 nvidia-drm.modeset=1

OK, Through trial and error, I think I had the wrong busid.

❯ lspci -s 0:0:2
00:02.0 VGA compatible controller: Intel Corporation Raptor Lake-P [Iris Xe Graphics] (rev 04)
❯ lspci -s 0:1:0
01:00.0 3D controller: NVIDIA Corporation AD106M [GeForce RTX 4070 Max-Q / Mobile] (rev a1)

I updated to the above, but I am still having the same issue.

le sigh

Side Question - What is the best way to get the busid in the proper needed decimal format on Wayland?

Just adding my most recent version of my config. Still having the same issues though.

  • Original can be seen here.
  • Code for easy reference below:
{ pkgs, config, libs, inputs, ... }:
let
  username = if builtins.getEnv "SUDO_USER" != "" then
    builtins.getEnv "SUDO_USER"
  else
    builtins.getEnv "USER";
in {

  # Enable OpenGL
  hardware.opengl = {
    enable = true;
    driSupport = true;
    driSupport32Bit = true;
    # https://github.com/NixOS/nixos-hardware/blob/5d48925b815fd202781bfae8fb6f45c07112fdb2/common/gpu/nvidia/default.nix
    extraPackages = with pkgs; [ vaapiVdpau ];
  };

  # Load nvidia driver for Xorg and Wayland
  services.xserver.videoDrivers = [ "nvidia" ];

  hardware.nvidia = {

    # Modesetting is required.
    modesetting.enable = true;

    # Nvidia power management. Experimental, and can cause sleep/suspend to fail.
    powerManagement.enable = false;
    # Fine-grained power management. Turns off GPU when not in use.
    # Experimental and only works on modern Nvidia GPUs (Turing or newer).
    powerManagement.finegrained = false;

    # Use the NVidia open source kernel module (not to be confused with the
    # independent third-party "nouveau" open source driver).
    # Support is limited to the Turing and later architectures. Full list of
    # supported GPUs is at:
    # https://github.com/NVIDIA/open-gpu-kernel-modules#compatible-gpus
    # Only available from diver 515.43.04+
    # Currently alpha-quality/buggy, so false is currently the recommened setting.
    open = false;

    # Enable the Nvidia settings menu,
    # accessible via `nvidia-settings`.
    nvidiaSettings = true;

    # Optionally, you may need to select the appropriate driver version for your specific GPU.
    # package = config.boot.kernelPackages.nvidiaPackages.production;
    # Can select driver here:
    # https://github.com/NixOS/nixpkgs/blob/979a311fbd179b86200e412a3ed266b64808df4e/pkgs/os-specific/linux/nvidia-x11/default.nix#L36
    package = let
      rcu_patch = pkgs.fetchpatch {
        url =
          "https://github.com/gentoo/gentoo/raw/c64caf53/x11-drivers/nvidia-drivers/files/nvidia-drivers-470.223.02-gpl-pfn_valid.patch";
        hash = "sha256-eZiQQp2S/asE7MfGvfe6dA/kdCvek9SYa/FFGp24dVg=";
      };
    in config.boot.kernelPackages.nvidiaPackages.mkDriver {
      # version = "535.154.05";
      # sha256_64bit = "sha256-fpUGXKprgt6SYRDxSCemGXLrEsIA6GOinp+0eGbqqJg=";
      # sha256_aarch64 = "sha256-G0/GiObf/BZMkzzET8HQjdIcvCSqB1uhsinro2HLK9k=";
      # openSha256 = "sha256-wvRdHguGLxS0mR06P5Qi++pDJBCF8pJ8hr4T8O6TJIo=";
      # settingsSha256 = "sha256-9wqoDEWY4I7weWW05F4igj1Gj9wjHsREFMztfEmqm10=";
      # persistencedSha256 =
      #   "sha256-d0Q3Lk80JqkS1B54Mahu2yY/WocOqFFbZVBh+ToGhaE=";

      version = "550.40.07";
      sha256_64bit = "sha256-KYk2xye37v7ZW7h+uNJM/u8fNf7KyGTZjiaU03dJpK0=";
      sha256_aarch64 = "sha256-AV7KgRXYaQGBFl7zuRcfnTGr8rS5n13nGUIe3mJTXb4=";
      openSha256 = "sha256-mRUTEWVsbjq+psVe+kAT6MjyZuLkG2yRDxCMvDJRL1I=";
      settingsSha256 = "sha256-c30AQa4g4a1EHmaEu1yc05oqY01y+IusbBuq+P6rMCs=";
      persistencedSha256 =
        "sha256-11tLSY8uUIl4X/roNnxf5yS2PQvHvoNjnd2CB67e870=";

      patches = [ rcu_patch ];
    };

    prime = {

      # offload = {
      #     enable = true;
      #     enableOffloadCmd = true;
      # };

      # OR

      sync.enable = true;

      # Make sure to use the correct Bus ID values for your system!
      # https://wiki.nixos.org/wiki/Nvidia#Configuring_Optimus_PRIME:_Bus_ID_Values_(Mandatory)
      # Bus ID of the Intel GPU. You can find it using lspci, either under 3D or VGA. ANd can match with: sudo lshw -c display
      # Note the two values under "bus info" above, which may differ from laptop to laptop. Our Nvidia Bus ID is 0e:00.0 and our Intel Bus ID is 00:02.0. Watch out for the formatting; convert them from hexadecimal to decimal, remove the padding (leading zeroes), replace the dot with a colon
      intelBusId = "PCI:0:2:0";
      nvidiaBusId = "PCI:1:0:0";
    };

  };

  home-manager.users."${username}" = {

    home.packages = with pkgs; [
      # Nvidia settings
      gnomeExtensions.gpu-profile-selector
      inputs.envycontrol.packages.x86_64-linux.default
    ];

    dconf.settings = with inputs.home-manager.lib.hm.gvariant; {
      # Gnome Extention - GPU Profile Selector
      "org/gnome/shell/extensions/GPU_profile_selector" = {
        rtd3 = true;
        force-composition-pipeline = true;
        coolbits = true;
        force-topbar-view = false;
      };
    };
  };
}

Current journalctl:

Apr 22 16:45:33 evo kernel: Command line: initrd=\EFI\nixos\nxndanw4cqi4k8y7sjdr67rfxsca620a-initrd-linux-6.6.28-initrd.efi init=/nix/store/aw0275438d3src1b5jb77xvyzprq51v8-nixos-system-evo-24.05.20240421.6143fc5/init loglevel=4 nvidia-drm.modeset=1
Apr 22 16:45:33 evo kernel: Kernel command line: initrd=\EFI\nixos\nxndanw4cqi4k8y7sjdr67rfxsca620a-initrd-linux-6.6.28-initrd.efi init=/nix/store/aw0275438d3src1b5jb77xvyzprq51v8-nixos-system-evo-24.05.20240421.6143fc5/init loglevel=4 nvidia-drm.modeset=1

Still fighting this.

Just apending some new info.

❯ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

❯ lsmod | grep nvidia

❯ sudo modprobe nvidia
modprobe: ERROR: could not find module by name='off'
modprobe: ERROR: could not insert 'off': Unknown symbol in module, or unknown parameter (see dmesg)

so that then lead me to:

❯ bat /etc/modprobe.d/blacklist-nvidia.conf
───────┬─────────────────────────────────────────────
       │ File: /etc/modprobe.d/blacklist-nvidia.conf
───────┼─────────────────────────────────────────────
   1   │ # Automatically generated by EnvyControl
   2   │
   3   │ blacklist nouveau
   4   │ blacklist nvidia
   5   │ blacklist nvidia_drm
   6   │ blacklist nvidia_uvm
   7   │ blacklist nvidia_modeset
   8   │ blacklist nvidia_current
   9   │ blacklist nvidia_current_drm
  10   │ blacklist nvidia_current_uvm
  11   │ blacklist nvidia_current_modeset
  12   │ alias nouveau off
  13   │ alias nvidia off
  14   │ alias nvidia_drm off
  15   │ alias nvidia_uvm off
  16   │ alias nvidia_modeset off
  17   │ alias nvidia_current off
  18   │ alias nvidia_current_drm off
  19   │ alias nvidia_current_uvm off
  20   │ alias nvidia_current_modeset off

My current nvidia config looks like:

{ pkgs, config, lib, inputs, ... }: {
  imports = [
    inputs.nixos-hardware.nixosModules.common-gpu-nvidia

    # TODO: why do I get the below error?
    # error: The option `hardware.intelgpu.loadInInitrd' in `/nix/store/4mgg9mrh8g0qj4g3z9zvqhrniig10bsn-source/systems/evo/hardware/gpus.nix' is already declared in `/nix/store/75hvhrfigcnckibdlg877157bpwjmy85-source/common/gpu/intel'.
    # Where is the other coming from?g
    # inputs.nixos-hardware.nixosModules.common-gpu-intel
  ];

  boot = {
    blacklistedKernelModules = lib.mkDefault [ "nouveau" ];
    kernelModules = [ "kvm-intel" "nvidia" ];
  };

  hardware = {
    opengl = {
      enable = true;
      driSupport = true;
      driSupport32Bit = true;
      extraPackages = [ pkgs.intel-media-driver ];
    };

    nvidia = {
      # Modesetting is required.
      modesetting.enable = true;

      # Optionally, you may need to select the appropriate driver version for your specific GPU.
      package = config.boot.kernelPackages.nvidiaPackages.production;

      # Nvidia power management. Experimental, and can cause sleep/suspend to fail.
      # Enable this if you have graphical corruption issues or application crashes after waking
      # up from sleep. This fixes it by saving the entire VRAM memory to /tmp/ instead
      # of just the bare essentials.
      powerManagement.enable = false;

      # Fine-grained power management. Turns off GPU when not in use.
      # Experimental and only works on modern Nvidia GPUs (Turing or newer).
      powerManagement.finegrained = false;

      # Use the NVidia open source kernel module (not to be confused with the
      # independent third-party "nouveau" open source driver).
      # Support is limited to the Turing and later architectures. Full list of
      # supported GPUs is at:
      # https://github.com/NVIDIA/open-gpu-kernel-modules#compatible-gpus
      # Only available from driver 515.43.04+
      # Currently alpha-quality/buggy, so false is currently the recommended setting.
      open = false;

      prime = {
        intelBusId = "PCI:0:2:0";
        nvidiaBusId = "PCI:1:0:0";
        # Make the Intel iGP default. The NVIDIA Quadro is for CUDA/NVENC
        reverseSync.enable = true;
        # sync.enable = true;
      };
      nvidiaSettings = true;
    };
  };

  services.xserver.videoDrivers = [ "nvidia" ];

}

Did I do something wrong in there that would cause my issues?

Thank you.