Nvidia docker container runtime doesn't detect my gpu

nvidia.nix config (for nvidia drivers):

{ config, lib, pkgs, inputs, ... }:

{
        #"NixOS" Stable - 24.05 still calls the graphical togable option as "OpenGL"
        hardware.opengl = {
                enable = true;
        };

        #For nixos-unstable, they renamed it
        #hardware.graphics.enable = true;

        services.xserver.enable = true;
        services.xserver.videoDrivers = ["nvidia"];

        hardware.nvidia ={
                modesetting.enable = true;
                powerManagement.enable = false;
                powerManagement.finegrained = false;

                open = true;

                nvidiaSettings = false;

#               package = pkgs.linuxPackages_6_10.nvidiaPackages.beta;
#               package = config.boot.kernelPackages.nvidiaPackages.latest;

        };
}

docker.nix (to enable docker and the nvidia runtime):

{ config, lib, pkgs, ... }:
{
  virtualisation.docker = {
      enable = true;
      enableOnBoot = true;
      # Nvidia Docker (deprecated)
      #enableNvidia = true;
  };

  hardware.nvidia-container-toolkit.enable = true;
  # libnvidia-container does not support cgroups v2 (prior to 1.8.0)
  # https://github.com/NVIDIA/nvidia-docker/issues/1447
  #systemd.enableUnifiedCgroupHierarchy = false;
}

When i run:

$ docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].

Thanks for the help! ngl i’ve been stuck with this for a while…

Hello @SpidFightFR!

With hardware.nvidia-container-toolkit.enable = true;, the Container Device Interface (CDI) is used instead of the nvidia runtime wrappers.

With CDI you have to specify the devices with the --device argument instead of the --gpus one, like this:

$ docker run --rm --device nvidia.com/gpu=all ubuntu:latest nvidia-smi

Also, you will need to set Docker 25 (at least) in NixOS 24.05:

virtualisation.docker.package = pkgs.docker_25;

We are going to update the documentation to make this more clear and less error-prone. Please, let us know what would have helped in your case to identify the difference, it will help other users for sure.

2 Likes

Hey @ereslibre , hope you’re doing well.

Thank you so much for your quick answer !
Indeed it now works, thank you !

$ docker run --rm --device nvidia.com/gpu=all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi
Fri Aug 30 17:30:47 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.78                 Driver Version: 550.78         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3060        Off |   00000000:00:10.0 Off |                  N/A |
|  0%   38C    P8              8W /  170W |       2MiB /  12288MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

Final docker config:
docker.nix:

{ config, lib, pkgs, ... }:
{
  virtualisation.docker = {
      enable = true;
      enableOnBoot = true;
      package = pkgs.docker_25;
      # Nvidia Docker (deprecated)
      #enableNvidia = true;
  };

  hardware.nvidia-container-toolkit.enable = true;
  # libnvidia-container does not support cgroups v2 (prior to 1.8.0)
  # https://github.com/NVIDIA/nvidia-docker/issues/1447
  #systemd.enableUnifiedCgroupHierarchy = false;
}

Final nvidia.nix:

{ config, lib, pkgs, inputs, ... }:

{
        #NixOS Stable - 24.05 still calls the graphical togable option as "OpenGL"
        hardware.opengl = {
                enable = true;
                driSupport = true;
                driSupport32Bit = true;
        };

        #For nixos-unstable, they renamed it
        #hardware.graphics.enable = true;

        services.xserver.enable = true;
        services.xserver.videoDrivers = ["nvidia"];

        hardware.nvidia ={
                modesetting.enable = true;
                powerManagement.enable = false;
                powerManagement.finegrained = false;

                open = true;

                nvidiaSettings = false;

#               package = pkgs.linuxPackages_6_10.nvidiaPackages.beta;
#               package = config.boot.kernelPackages.nvidiaPackages.latest;

        };
}

By the way, speaking of nvidia, i seek to have the most minimalist server possible…
Do you think it’s possible to launch the nvidia drivers without a xorg session ?

so far i use services.xserver.videoDrivers = ["nvidia"]; but if i could ditch the xorg sessions while still loading the module, it would be lovely !

best regards, Spid