I have a recent install of NixOS
Both nvtop
and nvidia-smi
are unable to detect my Tesla-series Nvidia card
Outputs:
$ nvtop
No GPU to monitor.
$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
I have tried a few different options in my configuration.nix file
here is the output of lspci | grep -vga
:
Output:
04:00.0 VGA compatible controller: NVIDIA Corporation GM204GL [Tesla M60] (rev a1)
05:00.0 VGA compatible controller: NVIDIA Corporation GM204GL [Tesla M60] (rev a1)
0b:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 30)
I believe these to be the relevant config settings:
nvidia.nix
{ config, pkgs, ... }:
{
nix.settings = {
substituters = [ "https://cuda-maintainers.cachix.org" ];
# ?
# unsure if this is needed
trusted-public-keys = [
"cuda-maintainers.cachix.org-1:0dq3bujKpuEPMCX6U4WylrUDZ9JyUG0VpVZa7CNfq5E="
];
};
environment.systemPackages = with pkgs; [
cudatoolkit
linuxPackages.nvidia_x11
libGLU libGL
];
# OpenGL
hardware.graphics = {
enable = true;
enable32Bit = true;
};
hardware.nvidia = {
modesetting.enable = true; # I have also set this to false based on some suggestions I've seen
powerManagement.enable = true; # this has been both true and false
powerManagement.finegrained = false;
open = false; # this has been both true and false
nvidiaSettings = true;
package = config.boot.kernelPackages.nvidiaPackages.production;
# from https://discourse.nixos.org/t/nvidia-drivers-not-loading/40913/12
#?
forceFullCompositionPipeline = true;
#?
prime = {
offload.enable = true;
reverseSync.enable = true;
#nvidiaBusId = "PCI:05:00:0";
intelBusId = "PCI:0b:00:0";
nvidiaBusId = "PCI:04:00:0";
};
# enabling this fails to build
# datacenter.enable = true;
};
# uncommenting this leaves the screen at an uninteractable screen
#services.xserver.videoDrivers = [ "nvidia" ];
# boot.initrd.kernelModules = [ "nvidia" "i915" "nvidia_modeset" "nvidia_uvm" "nvidia_drm" ];
boot.kernelParams = [ "nvidia_drm.fbdev=1" ];
}
There are (obviously) other configuration set, but they are across multiple *.nix files
An important note to add - I am using xfce as a desktop environment on this server.
I’ve never used a DE on a server before this, but I’m not sure if that is entirely relevant.
Overall, I’m having a lot of trouble finding consistent information related to CUDA drivers, and almost no information related to server GPUs specifically.
Nvidia’s site indicates that the proper driver for my card should be the 550 driver, which seems to be the same one available in the production package.
I’m also having trouble finding any information related to GPUs that account for multiple PCI slots.
I’m fairly new to NixOS, so if there’s anything obvious that I’m missing, please let me know - or if there’s anything that may be breaking this that I’m just not accounting for.