botto
March 25, 2023, 12:14am
1
When I try to run a docker container that uses nvidia gpu I get:
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown.
Example of what I run: docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi
This is running on a headless system, so I would not be installing x11.
I’ve added docker support like this:
virtualisation = {
docker = {
enable = true;
enableNvidia = true;
};
};
And set up opengl and the nvidia hardware like this:
hardware.opengl = {
enable = true;
driSupport32Bit = true;
setLdLibraryPath = true;
};
hardware.nvidia.package = config.boot.kernelPackages.nvidiaPackages.stable;
hardware.nvidia.nvidiaSettings = true;
hardware.nvidia.powerManagement.enable = true;
Any ideas would be great!
1 Like
SergeK
March 25, 2023, 11:50am
2
Hmm, no ideas so far, but if that helps you rule out anything, I ran your docker run
command and got the expected nvidia-smi
output.
❯ nix-info -m
system: "x86_64-linux"
host os: Linux 6.1.19, NixOS, 23.05 (Stoat), 23.05.20230314.7067edc
multi-user?: yes
sandbox: yes
version: nix-env (Nix) 2.10.3
channels(ss): "nixgl"
channels(root): "nixgl, nixos-21.05.2132.733682c3292"
nixpkgs: /etc/nixpkgs/channels/nixpkgs
❯ nix eval $(readlink -f /etc/nixos)#nixosConfigurations.$(hostname).config.hardware.opengl --json --read-only --apply 'x: with builtins; let isDerivation = y: (y.outPath or "") == "derivation"; fmt = y: let r = tryEval (if isDerivation y then y.outPath else if isAttrs y then mapAttrs (_: fmt) y else if isList y then map fmt y else if isFunction y then "<FUNCTION>" else (toString y)); in if r.success then r.value else "<ERROR>"; in fmt x' | jq
{
"driSupport": "1",
"driSupport32Bit": "1",
"enable": "1",
"extraPackages": [
"/nix/store/v23silmnf6b650crz2g2l06yd314g9hh-nvidia-x11-525.89.02-6.1.19",
"/nix/store/4vk2nczjlkjcpyi27mgv502x56g16wk2-nvidia-vaapi-driver-0.0.8"
],
"extraPackages32": [
"/nix/store/b00bqq064wyiv7pckxgjxvgjn5fgk0sj-nvidia-x11-525.89.02-6.1.19-lib32",
"/nix/store/6jcbwama0gcr7l0nh46lps281r62vcmp-nvidia-vaapi-driver-0.0.8"
],
"package": "/nix/store/j5j4r8waw956z2xslbngiyx20kzcn6lj-mesa-22.3.5-drivers",
"package32": "/nix/store/r723f7y07dbb8k66ig0kaq9z7c6gaaf5-mesa-22.3.5-drivers",
"s3tcSupport": "<ERROR>",
"setLdLibraryPath": ""
}
❯ nix eval $(readlink -f /etc/nixos)#nixosConfigurations.$(hostname).config.virtualisation.docker --json --read-only --apply 'x: with builtins; let isDerivation = y: (y.outPath or "") == "derivation"; fmt = y: let r = tryEval (if isDerivation y then y.outPath else if isAttrs y then mapAttrs (_: fmt) y else if isList y then map fmt y else if isFunction y then "<FUNCTION>" else (toString y)); in if r.success then r.value else "<ERROR>"; in fmt x' | jq
{
"autoPrune": {
"dates": "weekly",
"enable": "",
"flags": []
},
"daemon": {
"settings": {
"group": "docker",
"hosts": [
"fd://"
],
"live-restore": "1",
"log-driver": "journald",
"runtimes": {
"nvidia": {
"path": "/nix/store/z0f68lrs2kvwljws8ppr5fqw38myrqax-nvidia-docker/bin/nvidia-container-runtime"
}
}
}
},
"enable": "1",
"enableNvidia": "1",
"enableOnBoot": "1",
"extraOptions": "",
"listenOptions": [
"/run/docker.sock"
],
"liveRestore": "1",
"logDriver": "journald",
"package": "/nix/store/wlz43wbfk2zvsngh1arw46ypb88gr14d-docker-20.10.23",
"rootless": {
"daemon": {
"settings": {}
},
"enable": "",
"package": "/nix/store/wlz43wbfk2zvsngh1arw46ypb88gr14d-docker-20.10.23",
"setSocketVariable": ""
},
"socketActivation": "<ERROR>",
"storageDriver": ""
}
botto
March 25, 2023, 12:04pm
3
Thank you for testing! Makes me at least confident that I can solve it somehow on my setup.
Are you running x11 with the nvidia card?
SergeK
March 25, 2023, 1:06pm
5
I also got this line in my config, I don’t remember the context: systemd.enableUnifiedCgroupHierarchy = false; # otherwise nvidia-docker fails
botto
March 25, 2023, 9:48pm
6
I finally got it working after adding services.xserver.enable = true;
to the nixos configuration.
1 Like
SergeK
March 26, 2023, 1:09am
7
Hm, could it be because of this line?
config = mkIf cfg.enable {
...
services.xserver.videoDrivers = mkIf (cfg.videoDriver != null) [ cfg.videoDriver ];
...
}
We should’ve actually started with checking whether /run/opengl-driver/lib/libnvidia-ml.so.1
actually existed in the no-xserver configuration
2 Likes
for me
hardware.opengl.enable = true; # needed for nvidia-docker
was enough to get the /run/opengl-driver/lib/libnvidia-ml.so.1
file and even to start nvidia-docker containers!