Docker rootless with nvidia support

ryanbalch · December 19, 2023, 7:31pm

I’m about to start a looking into docker and how runtimes are setup but was wondering if anyone has any insight into rootless, docker, and nvidia on nixos. After reading around I landed on something that works with sudo:

	virtualisation.docker = {
		enable = true;
		enableOnBoot = true;
		enableNvidia = true;
	};

	virtualisation.docker.rootless = {
		enable = true;
		setSocketVariable = true;
	};

then my docker application with:

services:
  librarian:
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

…for root/sudo seems to work fine.

For root/sudo “nvidia” is available, for my user it is not (I also get a warning like: WARNING: No cpuset support).

I’ve come across suggestion for cgroups like:

systemd.enableUnifiedCgroupHierarchy = false;

But none of the things I’ve seen used work - and later comments seem to suggest this was fixed some time ago.

ryanbalch · December 20, 2023, 4:06am

After reading through the nixpkgs I thought maybe I could fix it with some tweaks to the docker-rootless settings:

	virtualisation.docker = {
		enable = true;
		enableOnBoot = false;
		enableNvidia = true;

		rootless = {
			enable = true;
			setSocketVariable = true;
            daemon.settings = {
				runtimes = {
					nvidia = {
            			path = "${pkgs.nvidia-docker}/bin/nvidia-container-runtime";
          			};
				};
            };
		};
	};

… without luck. Now “docker info” looks right but attempts to run containers rootless generate:

Error response from daemon: could not select device driver "nvidia" with capabilities: [[gpu]]

It seems like there’s a permission breakdown somewhere but I’m not sure where to look.

APCodes · December 20, 2023, 6:31pm

From reading about the error message you are getting it might actually be related to some Nvidia driver stuff. Which unfortunately happens every once in a while.

Sadly I do not have deeper insight into this combination exactly. But what I would do, if I didn’t get it to work with docker, I’d give podman a try. There even appears to be a virtualisation.podman.enableNvidia option just like the docker one.

ryanbalch · December 20, 2023, 8:35pm

I am pretty sure it is a permissions thing because I already tried that. Now I’m not sure where to look next.

Podman basically does the same thing. Sudo containers do have gpu but regular users do not. Interestingly podman does not crash or give any indications anything is wrong, but any containers once spun up all report no gpu access (ie: pytorch, etc).

ryanbalch · January 4, 2024, 2:43am

After countless hours, I’m not sure where I got the idea to set this but my issue seemed to be setting:

setSocketVariable = true;

With that set it sets your local docker to the root-only (I think) and then you don’t get the GPUs in docker unless you are root/sudo. The default is false, so removing it should fix it should anyone else come across this.

gautaz · January 10, 2024, 2:23pm

Hello @ryanbalch, doesn’t this mean that your containers will then be executed by default as root?
This seems to contradict the whole point of enabling rootless docker, no?

ryanbalch · January 10, 2024, 4:50pm

To clarify - I incorrectly set setStocketVariable to true - I don’t remember why I did this initially. This seemed to work except when I wanted to use cuda/torch stuff. When I wanted a container to have access to gpu I had to root/sudo to get it. I ended up with this.

Deleting or setting to false I don’t need to root/sudo to run containers with the gpu attached.

I think overall docker does most of what it does as root regardless. I don’t think you can get away from that but when you’re doing remote debugging, etc, it becomes a whole thing if everyone has to be root to see and interact with the running containers.

I let others use my box that I know personally. I’m not worried about them using docker to do evil but I don’t want to require everyone be running around as root and messing with each others running containers, etc.

Podman actually has the same issue for me (only I’ve never gotten it working). I never found a mix of configs that let me get gpu enabled containers working as non-root.

At some point I’ll revisit podman to see if I can get it working, but for my purposes docker is fine, everyone knows it, and it works.

gautaz · January 11, 2024, 10:05am

When using rootless docker, an additional docker daemon is run as your user, i.e. (on my box):

del        10546  0.0  0.5 3310356 85100 ?       Sl   10:10   0:01 dockerd --config-file=/nix/store/nw3scihk5ddz35p80p6bi8psscyjb1sj-daemon.json

Where del is my local user name.

When you activate rootless docker on NixOS, you basically have multiple docker daemons running, i.e. (again on my box):

root        2415  0.0  0.4 3016516 73324 ?       Ssl  08:56   0:01 /nix/store/iyydng31g66rb3qrg04jnnx2ih7dy0py-moby-24.0.5/libexec/docker/dockerd --config-file=/nix/store/fl680xga4p6s1zxvbpk381swbcv547wj-daemon.json
del        10546  0.0  0.5 3310356 85100 ?       Sl   10:10   0:01 dockerd --config-file=/nix/store/nw3scihk5ddz35p80p6bi8psscyjb1sj-daemon.json

So here, there is one docker daemon run by root and another by my local user.

What setSocketVariable = true; means is that it tells your docker CLI to use the rootless docker daemon instance instead of the root one (by setting the DOCKER_HOST environment variable in your user session).

If you do not set this variable to true and do not add a --host argument to your docker CLI command, the CLI will talk with the root daemon instance and not the rootless one (and the containers are then run as root).

Hence the fact that you can use the GPUs because your root docker instance is configured to know the nvidia runtime (which is not the case for the rootless one).

gautaz · January 11, 2024, 3:25pm

My bad, the rootless docker daemon of @ryanbalch was also configured to know the nvidia runtime per its additional settings.
As @ryanbalch already pointed it, the issue is probably related to permissions.

My guess is that it is related to the permissions of the /dev/nvidia* devices which all belongs to the root group.
Is there a simple way on NixOS to change the permission group of these devices to something like video?
What is the standard way in NixOS to configure udev?

ryanbalch · January 11, 2024, 4:58pm

This is where I got stumped - wasn’t sure where to go next.

For me at least the /dev/nvidia* devices are owned by root, but the perms are the same for everyone (666) so I assumed that was not it.

gautaz · January 11, 2024, 7:09pm

Ah yes, you’re right, I didn’t even realize that everyone has rw access to all nvidia devices:

❯ ls -l /dev/nvidia*
crw-rw-rw- 1 root root 195,   0 Jan 11 08:55 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 Jan 11 08:55 /dev/nvidiactl
crw-rw-rw- 1 root root 195, 254 Jan 11 08:55 /dev/nvidia-modeset
crw-rw-rw- 1 root root 239,   0 Jan 11 08:55 /dev/nvidia-uvm
crw-rw-rw- 1 root root 239,   1 Jan 11 08:55 /dev/nvidia-uvm-tools

Well, I am now as lost as you are .

sergei · January 13, 2024, 5:38pm

I am able to successfuly run rootless docker with nvidia GPU support on NixOS.
I think it’s a little messy and not NixOS style, but (hey!) “it just works”!

The steps to reproduce:

add your user to docker group:

extraGroups = [ “docker” ];

find your rootless docker.service script
run in console

$ systemctl --user status docker.service

check out output
you’ll find something like “Loaded: loaded (/etc/systemd/user/docker.service; enabled; preset: enabled)”
copy the file in your local systemd directory to use as a template for your future modifications.

$ cp /etc/systemd/user/docker.service ~/.config/systemd/user/docker-mod.service

find your sudo/root docker deamon config

$ ps aux | grep docker

on my system it’s
/nix/store/<hash…>-moby-24.0.5/libexec/docker/dockerd --config-file=/nix/store/<hash…>-daemon.json
copy to change later

$ cp /nix/store/<hash…>-daemon.json ~/.config/systemd/user/daemon.json

change your service script and config file
i only changed fd:// to unix:// in [daemon.json]

...
  "hosts": [
    "unix://"
  ],
...

the full text of [daemon.json] after that

{
  "group": "docker",
  "hosts": [
    "unix://"
  ],
  "live-restore": true,
  "log-driver": "journald",
  "runtimes": {
    "nvidia": {
      "path": "/nix/store/<hash...>-nvidia-docker/bin/nvidia-container-runtime"
    }
  }
}

then changed [docker-mod.service], only changed the path to the changed config [daemon.json]

...
ExecStart=/nix/store/<hash...>-docker-24.0.5/bin/dockerd-rootless --config-file=/home/<USER-NAME>/.config/systemd/user/daemon.json
...

Start your new service via systemd

stop the old rootless service

$ systemctl --user stop docker.service

start the new rootless service

$ systemctl --user start docker-mod.service

check status of the new service

$ systemctl --user status docker-mod.service

you will see something like “API listen on /var/run/docker.sock”

run docker to test setup

$ DOCKER_HOST=unix:///var/run/docker.sock docker run --gpus ‘all’ pytorch/pytorch:2.1.2-cuda11.8-cudnn8-runtime nvidia-smi

or you can use --runtime instead

$ DOCKER_HOST=unix:///var/run/docker.sock docker run --runtime=nvidia --rm -ti nvidia/cuda:12.3.1-base-ubuntu20.04 nvidia-smi

gautaz · January 14, 2024, 11:00am

Hello @sergei, thanks for your input :-).

Can you tell us what you intended to change in the current rootless docker installation made natively by nixpkgs that you couldn’t do in a nix way?

What setting(s)/configuration makes your custom install work that the current nixpkgs install does not?

sergei · January 17, 2024, 9:50pm

do in a nix way

I have conducted some research and it appears that altering the file configuration.nix
could achieve the same result.

...
  environment.sessionVariables = {
     DOCKER_HOST="unix:///var/run/docker.sock";
  };

...
  virtualisation.docker.rootless = {
    enable = true;
    setSocketVariable = true;
    daemon.settings = {
         default-runtime = "nvidia";
         runtimes.nvidia.path = "${pkgs.nvidia-docker}/bin/nvidia-container-runtime";
    };
  };
...

However, I am unsure about how to modify the ‘#no-cgroups = false’ line in config.toml (provided by nvidia-container-runtime) Do you have any ideas?

gautaz · January 18, 2024, 2:05pm

Hello @sergei,

Looking at the configuration.nix you’ve provided, I guess that the environment.sessionVariables will override the effect of setSocketVariable.
This means that the docker client will use the docker daemon in root mode and not the one started in rootless mode by systemd on behalf of the user.

If you are using the rootless docker daemon, something close to this should be displayed:

❯ echo $DOCKER_HOST
unix:///run/user/1000/docker.sock

If you are using the docker daemon in root mode, this will be displayed:

❯ echo $DOCKER_HOST
unix:///var/run/docker.sock

This last value would mean that it currently work on your side because the docker daemon in root mode is in fact able to use the nvidia card and the rootless one would still be unable to do so.

Regarding the override of config.toml, I have started another discussion but there are no answers for now:

sergei · January 18, 2024, 4:41pm

Hi, @gautaz
Thanks, your post was crucial for me to rewrite the overlay in the proper way!
I think I could finally run docker in rootless mode with Nvidia GPU.

The steps to reproduce:

configuration.nix

{
...
    nixpkgs.overlays = [ (final: prev: {                                                          
        nvidia-docker = prev.pkgs.mkNvidiaContainerPkg {                                          
            name = "nvidia-docker";
            containerRuntimePath = "runc"; 
            configTemplate = <YOUR_PATH>/config.toml ;
            additionalPaths = [(prev.pkgs.callPackage <YOUR_PATH>/nixpkgs/pkgs/applications/virtualization/nvidia-docker {})];
        };
 
    })];
...

  virtualisation.docker.rootless = {
    enable = true;
    setSocketVariable = true;
    daemon.settings = {
        default-runtime = "nvidia";
        runtimes.nvidia.path = "${pkgs.nvidia-docker}/bin/nvidia-container-runtime";
        exec-opts = ["native.cgroupdriver=cgroupfs"];
    };
  };
...
}

where
1.
<YOUR_PATH>/nixpkgs/pkgs/applications/virtualization/nvidia-docker
is the local directory copied from nixpkgs (for my channel - 23.11)

2.
<YOUR_PATH>/config.toml

disable-require = false
#swarm-resource = "DOCKER_RESOURCE_GPU"

[nvidia-container-cli]
#root = "/run/nvidia/driver"
#path = "/usr/bin/nvidia-container-cli"
environment = []
#debug = "/var/log/nvidia-container-runtime-hook.log"
ldcache = "/tmp/ld.so.cache"
load-kmods = true
no-cgroups = true
#user = "root:video"
ldconfig = "@@glibcbin@/bin/ldconfig"

3.
as for
exec-opts = ["native.cgroupdriver=cgroupfs"];
see

Tests

$ sudo docker run --runtime=nvidia --privileged --rm -ti nvidia/cuda:12.3.1-base-ubuntu20.04 nvidia-smi

$ docker run --runtime=nvidia --rm -ti nvidia/cuda:12.3.1-base-ubuntu20.04 nvidia-smi

I think It’s possible to just reference the nixpkgs directory (using its URL) and to change config.toml with a nix rewrite rule but I don’t know how…

sergei · January 19, 2024, 1:26pm

I was able to simplify the setup described above. No need for local files or directories, everything is in configuration.nix.

...
    nixpkgs.overlays = [ (final: prev: 
    let 
        my-config-toml = prev.pkgs.writeText "config.toml" ''
                disable-require = false
                #swarm-resource = "DOCKER_RESOURCE_GPU"

                [nvidia-container-cli]
                #root = "/run/nvidia/driver"
                #path = "/usr/bin/nvidia-container-cli"
                environment = []
                #debug = "/var/log/nvidia-container-runtime-hook.log"
                ldcache = "/tmp/ld.so.cache"
                load-kmods = true
                no-cgroups = true
                #user = "root:video"
                ldconfig = "@@glibcbin@/bin/ldconfig"
        '';
    in
    {
        nvidia-docker = prev.pkgs.mkNvidiaContainerPkg {
            name = "nvidia-docker";
            containerRuntimePath = "runc"; 
            configTemplate = my-config-toml ;
            additionalPaths = [(prev.pkgs.callPackage <nixpkgs/pkgs/applications/virtualization/nvidia-docker> {})];
        };
     })];

...

  virtualisation.docker.rootless = {
    enable = true;
    setSocketVariable = true;
    daemon.settings = {
        default-runtime = "nvidia";
        runtimes.nvidia.path = "${pkgs.nvidia-docker}/bin/nvidia-container-runtime";
        exec-opts = ["native.cgroupdriver=cgroupfs"];
    };
  };
...

gautaz · January 24, 2024, 10:48am

Sorry @sergei for the late reply.
Thanks for sharing your configuration!
I have not yet tested it yet because I lack some time right now and I need to adapt the solution to my flake based configuration.
I’ll share more news as soon as I have tested this on my side.

SergeK · January 24, 2024, 6:11pm

I merely skimmed the thread, but note that nvidia-container-toolkit: 1.9.0 -> 1.15.0-rc.3 by aaronmondal · Pull Request #278969 · NixOS/nixpkgs · GitHub might resolve some of your issues

gautaz · February 22, 2024, 3:58pm

Hello @SergeK,

Thanks for your insight.

I have just updated my flake dependencies and my system is now using version 1.15.0-rc.3 of nvidia-container-toolkit:

❯ find /nix/store/ -type d -iname "*container-toolkit-container-toolkit*"
/nix/store/lbyvily2r163pxc880p4qx9kxjyfs3c9-container-toolkit-container-toolkit-1.15.0-rc.3

Still no improvement on using the docker nvidia runtime with rootless docker off the shelf:

❯ echo $DOCKER_HOST
unix:///run/user/1000/docker.sock

❯ cat /nix/store/al5nrv1vcc57wf7bqcpkf7bl2qf9zfgx-daemon.json
{
  "default-runtime": "nvidia",
  "runtimes": {
    "nvidia": {
      "path": "/nix/store/r8zbx7dsp42mlk5c8m2hg7da91by1v8j-nvidia-docker/bin/nvidia-container-runtime"
    }
  }
}

❯ docker run --gpus=all --rm nvcr.io/nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].

❯ docker run --gpus=all --runtime=nvidia --rm nvcr.io/nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].

Would you please mind detailing a bit more why version 1.15.0-rc.3 of nvidia-container-toolkit would resolve part of this issue?

Does it provide additional knobs to modify the config.toml configuration file related to the toolkit?