Running docker with nvidia complains it can't find `libnvidia-ml.so.1`

When I try to run a docker container that uses nvidia gpu I get:

nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown.

Example of what I run: docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi

This is running on a headless system, so I would not be installing x11.

I’ve added docker support like this:

virtualisation = {
    docker = {
      enable = true;
      enableNvidia = true;
    };
  };

And set up opengl and the nvidia hardware like this:

  hardware.opengl = {                                                                                       
    enable = true;                          
    driSupport32Bit = true;                           
    setLdLibraryPath = true;                                                                                
  };                                              
                                                      
  hardware.nvidia.package = config.boot.kernelPackages.nvidiaPackages.stable;
  hardware.nvidia.nvidiaSettings = true;
  hardware.nvidia.powerManagement.enable = true;

Any ideas would be great!

1 Like

Hmm, no ideas so far, but if that helps you rule out anything, I ran your docker run command and got the expected nvidia-smi output.

❯ nix-info -m

  • system: "x86_64-linux"
  • host os: Linux 6.1.19, NixOS, 23.05 (Stoat), 23.05.20230314.7067edc
  • multi-user?: yes
  • sandbox: yes
  • version: nix-env (Nix) 2.10.3
  • channels(ss): "nixgl"
  • channels(root): "nixgl, nixos-21.05.2132.733682c3292"
  • nixpkgs: /etc/nixpkgs/channels/nixpkgs
❯ nix eval $(readlink -f /etc/nixos)#nixosConfigurations.$(hostname).config.hardware.opengl --json --read-only --apply 'x: with builtins; let isDerivation = y: (y.outPath or "") == "derivation"; fmt = y: let r = tryEval (if isDerivation y then y.outPath else if isAttrs y then mapAttrs (_: fmt) y else if isList y then map fmt y else if isFunction y then "<FUNCTION>" else (toString y)); in if r.success then r.value else "<ERROR>"; in fmt x' | jq
{
  "driSupport": "1",
  "driSupport32Bit": "1",
  "enable": "1",
  "extraPackages": [
    "/nix/store/v23silmnf6b650crz2g2l06yd314g9hh-nvidia-x11-525.89.02-6.1.19",
    "/nix/store/4vk2nczjlkjcpyi27mgv502x56g16wk2-nvidia-vaapi-driver-0.0.8"
  ],
  "extraPackages32": [
    "/nix/store/b00bqq064wyiv7pckxgjxvgjn5fgk0sj-nvidia-x11-525.89.02-6.1.19-lib32",
    "/nix/store/6jcbwama0gcr7l0nh46lps281r62vcmp-nvidia-vaapi-driver-0.0.8"
  ],
  "package": "/nix/store/j5j4r8waw956z2xslbngiyx20kzcn6lj-mesa-22.3.5-drivers",
  "package32": "/nix/store/r723f7y07dbb8k66ig0kaq9z7c6gaaf5-mesa-22.3.5-drivers",
  "s3tcSupport": "<ERROR>",
  "setLdLibraryPath": ""
}
❯ nix eval $(readlink -f /etc/nixos)#nixosConfigurations.$(hostname).config.virtualisation.docker --json --read-only --apply 'x: with builtins; let isDerivation = y: (y.outPath or "") == "derivation"; fmt = y: let r = tryEval (if isDerivation y then y.outPath else if isAttrs y then mapAttrs (_: fmt) y else if isList y then map fmt y else if isFunction y then "<FUNCTION>" else (toString y)); in if r.success then r.value else "<ERROR>"; in fmt x' | jq
{
  "autoPrune": {
    "dates": "weekly",
    "enable": "",
    "flags": []
  },
  "daemon": {
    "settings": {
      "group": "docker",
      "hosts": [
        "fd://"
      ],
      "live-restore": "1",
      "log-driver": "journald",
      "runtimes": {
        "nvidia": {
          "path": "/nix/store/z0f68lrs2kvwljws8ppr5fqw38myrqax-nvidia-docker/bin/nvidia-container-runtime"
        }
      }
    }
  },
  "enable": "1",
  "enableNvidia": "1",
  "enableOnBoot": "1",
  "extraOptions": "",
  "listenOptions": [
    "/run/docker.sock"
  ],
  "liveRestore": "1",
  "logDriver": "journald",
  "package": "/nix/store/wlz43wbfk2zvsngh1arw46ypb88gr14d-docker-20.10.23",
  "rootless": {
    "daemon": {
      "settings": {}
    },
    "enable": "",
    "package": "/nix/store/wlz43wbfk2zvsngh1arw46ypb88gr14d-docker-20.10.23",
    "setSocketVariable": ""
  },
  "socketActivation": "<ERROR>",
  "storageDriver": ""
}

Thank you for testing! Makes me at least confident that I can solve it somehow on my setup.

Are you running x11 with the nvidia card?

Yes, I am

The post must be at least 20 characters

I also got this line in my config, I don’t remember the context: systemd.enableUnifiedCgroupHierarchy = false; # otherwise nvidia-docker fails

I finally got it working after adding services.xserver.enable = true; to the nixos configuration.

1 Like

Hm, could it be because of this line?

  config = mkIf cfg.enable {
    ...
    services.xserver.videoDrivers = mkIf (cfg.videoDriver != null) [ cfg.videoDriver ];
    ...
  }

We should’ve actually started with checking whether /run/opengl-driver/lib/libnvidia-ml.so.1 actually existed in the no-xserver configuration :man_facepalming:

2 Likes

for me

  hardware.opengl.enable = true;  # needed for nvidia-docker

was enough to get the /run/opengl-driver/lib/libnvidia-ml.so.1 file and even to start nvidia-docker containers!