Nvidia Container Runtime

mydigitaldomain · March 19, 2024, 3:21am

Hello!

I have an application I am interested in deploying on my nixos machine using docker.

The application is called viseron, and I want to leverage their cuda image to utilize the Nvidia GPU in my machine. Instructions are here:

So for some reason, trying to deploy this container gives me the following error:

$ docker-compose -f ~/docker/viseron/docker-compose-viseron.yaml up -d
[+] Running 28/1
 ✔ viseron 27 layers [⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿]      0B/0B      Pulled                                                     98.3s 
[+] Running 0/1
 ⠧ Container viseron  Starting                                                                                                 1.7s 
Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: unable to retrieve OCI runtime error (open /var/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/2b948c145f1fefce1b54ef4bda0c96c56777a35ac9ff3552784a95ed057fb98f/log.json: no such file or directory): /nix/store/p1w9c7s3438515ksyap4z24b828ciqnw-nvidia-docker/bin/nvidia-container-runtime did not terminate successfully: exit status 125: unknown

I noticed that this is specific error when trying to run with the Nvidia docker runtime. I am not sure what is exactly going on here or if any of y’all on nixos have had this issue.

As for my nixos configuration, here is the most relevant configuration:

$ cat flake/docker.nix 
{ config, pkgs, ... }:

{

  environment.systemPackages = with pkgs; [    
    docker-compose
  ];

  virtualisation.docker = {
    enable = true;
    extraOptions = "--experimental";
    storageDriver = "overlay2";
  };

}

]$ cat flake/nvidia.nix 
{ config, pkgs, ... }:

{
  nixpkgs.config.allowUnfree = true;

  hardware.opengl = {
    enable = true;
    driSupport = true;
    driSupport32Bit = true;
  };

  services.xserver = {
    enable = true;
    videoDrivers = ["nvidia"];
  };


  hardware.nvidia = {

    modesetting.enable = true;

    powerManagement = {
      enable = false;
      finegrained = false;
    };

    open = false;

    nvidiaSettings = true;
    
    package = config.boot.kernelPackages.nvidiaPackages.beta;
  };

  virtualisation.docker = {
    enableNvidia = true;
  };

}

[danielh@tulkas:/etc/nixos]$

mydigitaldomain · March 19, 2024, 3:23am

Just would like to note here that I have another open source solution, immich, working with my nvidia GPU. However, immich containers do not use the nvidia runtime as far as I can tell - and do not get the same error.

thoth · June 17, 2024, 4:41pm

I am running into similar issues after upgrading to 24.05. did you resolve this?

s3l4h · June 27, 2024, 2:07pm

I am in the same situation, I have a jupyterhub service that successfully deployed gpu-powered containers, but since the migration of my infrastructure on Nixos 24.05, the nvidia-container runtime still fails.

Have you found a solution since the last 10 days ?