Using nvidia-container-runtime with containerd on NixOS

wokalski · May 4, 2023, 10:54am

I’m trying to deploy a k3s cluster on NixOS which will deploy gpu-enabled pods. I successfully installed nvidia, and nvidia-smi from the shell works well.

Then, I first followed common sense and created a config similar to what nvidia suggests in my configuration.nix.

       plugins."io.containerd.grpc.v1.cri".containerd = {
         default_runtime_name = "nvidia";
         runtimes.runc = {
           runtime_type = "io.containerd.runc.v2";
         };
         runtimes.nvidia = {
           privileged_without_host_devices = false;
           runtime_type = "io.containerd.runc.v2";
           options = {
             BinaryName = "${pkgs.nvidia-docker}/bin/nvidia-container-runtime";
             SystemdCgroup = false;
           };
         };

This wasn’t enough. nvidia-container-runtime was ran but things it depends on (like nvidia-container-runtime-hook) were not in PATH. So both scheduling pods in k3s as well as running a container with ctr failed. So I did this:

  systemd.services.containerd.path = with pkgs; [
    containerd
    runc
    iptables
    nvidia-docker
  ];

Ok… now we’re getting somewhere. I was now able to successfully run containers (using ctr as well as with k3s) but they didn’t see nvidia-smi…

And now I’m stuck since I don’t know what’s missing in the paths or elsewhere. One extra note; Maybe I’d be able to progress a bit more but I couldn’t figure out how to enable logging by supplying an alternative config.toml for the nvidia-container-runtime.

eadwu · May 7, 2023, 4:54am

You will have to manually create a cache for libnvidia-container to parse or modify the derivation for libnvidia-container to work.

wokalski · May 9, 2023, 10:49am

I managed to run containers directly with nvidia-container-runtime and they didn’t contain nvidia-smi either (following this).

I don’t know what you mean by “create a cache for libnvidia-container to parse”. I do understand what you mean by modifying the derivation but are you able to shed more light in what way I should change it?

SergeK · May 9, 2023, 7:28pm

Hm, does nvidia-smi get mounted if you just docker run --rm -it --gpus all nvidia/cuda:12.1.1-runtime-ubi8 without k3s?

wokalski · May 10, 2023, 5:26am

It does with docker but I’m trying to make it work with containerd. And to understand what’s going on, I was trying to make it work with runc first (via nvidia-container-runtime)

eadwu · May 10, 2023, 5:38am

libnvidia indexes the required libraries through the cache created by ldconfig. I don’t have the time to list out everything I did step by step or to sift through my configuration to find exactly what I changed but here are snippets.

  nixpkgs.overlays = [ (final: prev: {

    nvidia-k3s = with final.pkgs; mkNvidiaContainerPkg {
      name = "nvidia-k3s";
      containerRuntimePath = "runc";
      configTemplate = ./config.toml;
    };

    libnvidia-container = prev.libnvidia-container.overrideAttrs (oldAttrs: {
      version = flakes.libnvidia-container.version;
      src = flakes.libnvidia-container.path;

      patches = [
        ./libnvidia-container.patch
        ./libnvidia-container-ldcache.patch
        (flakes.nixpkgs.path + "/pkgs/applications/virtualization/libnvidia-container/inline-c-struct.patch")
      ];

      postPatch = (oldAttrs.postPatch or "") + ''
        sed -i "s@/etc/ld.so.cache@/tmp/ld.so.cache@" src/common.h
      '';
    });

    nvidia-container-toolkit = prev.nvidia-container-toolkit.overrideAttrs (oldAttrs: {
      version = flakes.nvidia-container-toolkit.version;
      src = flakes.nvidia-container-toolkit.path;

      postPatch = (oldAttrs.postPatch or "") + ''
        sed -i "s@/etc/ld.so.cache@/tmp/ld.so.cache@" internal/ldcache/ldcache.go
      '';
    });
  }) ];

libnvidia-container-ldcache

diff --git a/src/nvc_ldcache.c b/src/nvc_ldcache.c
index db3b2f6..360fd23 100644
--- a/src/nvc_ldcache.c
+++ b/src/nvc_ldcache.c
@@ -367,7 +367,7 @@ nvc_ldcache_update(struct nvc_context *ctx, const struct nvc_container *cnt)
         if (validate_args(ctx, cnt != NULL) < 0)
                 return (-1);
 
-        argv = (char * []){cnt->cfg.ldconfig, "-f", "/etc/ld.so.conf", "-C", "/etc/ld.so.cache", cnt->cfg.libs_dir, cnt->cfg.libs32_dir, NULL};
+        argv = (char * []){cnt->cfg.ldconfig, "-C", "/tmp/ld.so.cache", cnt->cfg.libs_dir, cnt->cfg.libs32_dir, NULL};
         if (*argv[0] == '@') {
                 /*
                  * We treat this path specially to be relative to the host filesystem.

config.toml

disable-require = false
#swarm-resource = "DOCKER_RESOURCE_GPU"
#accept-nvidia-visible-devices-envvar-when-unprivileged = true
#accept-nvidia-visible-devices-as-volume-mounts = false

[nvidia-container-cli]
#root = "/run/nvidia/driver"
path = "@nvidia-container-cli@"
environment = []
debug = "/var/log/nvidia-container-toolkit.log"
ldcache = "/tmp/ld.so.cache"
load-kmods = true
#no-cgroups = false
#user = "root:video"
ldconfig = "@@glibcbin@/bin/ldconfig"

[nvidia-container-runtime]
debug = "/var/log/nvidia-container-runtime.log"
log-level = "debug"

# Specify the runtimes to consider. This list is processed in order and the PATH
# searched for matching executables unless the entry is an absolute path.
runtimes = [
    "@containerRuntimePath@",
]

mode = "auto"

    [nvidia-container-runtime.modes.csv]
    mount-spec-path = "/etc/nvidia-container-runtime/host-files-for-container.d"

…

  systemd.services.k3s.after = lib.mkForce [];
  systemd.services.k3s.wants = lib.mkForce [];
  systemd.services.k3s.serviceConfig.KillMode = lib.mkForce "control-group";
  systemd.services.k3s.path = with pkgs; [
    glibc
    # NVIDIA Container Support
    nvidia-k3s
    # Expose NVIDIA binaries to PATH
    (config.hardware.nvidia.package.overrideAttrs (oldAttrs:
      {
        builder = ./nvidia-builder.sh;
      }))
  ];
  systemd.services.k3s.serviceConfig.PrivateTmp = true;
  systemd.services.k3s.preStart = let

  in ''
    # ldconfig wants to generate symlinks
    rm -rf /tmp/nvidia-libs
    mkdir -p /tmp/nvidia-libs
    for thing in ${config.hardware.nvidia.package.overrideAttrs (oldAttrs: {
      builder = ./nvidia-builder.sh;
    })}/lib/*;
    do
      ln -s $(readlink -f $thing) /tmp/nvidia-libs/$(basename $thing)
    done

    echo "Initializing cache with directory"
    ldconfig -C /tmp/ld.so.cache /tmp/nvidia-libs

    echo "Printing ld cache contents"
    ldconfig -C /tmp/ld.so.cache --print-cache
  '';

nvidia-builder.sh is just a copy that nulls the patchelf step, since that would break loading them on non-NixOS distributions.

SergeK · May 10, 2023, 6:05am

Cool. Is the goal for ldconfig to spit out /run/opengl-driver/lib on NixOS, or is it not to output FHS paths on non-NixOS?

This looks somewhat resembling of the Singularity/Apptainer situation (discovers libcuda via ldconfig), and AFAIU no fixes have been upstreamed into nixpkgs for either of singularity or containerd

wokalski · May 10, 2023, 6:16am

Thank you. This should get me up to speed!

rusty-jules · May 17, 2023, 12:51am

@wokalski Have you had any luck with this configuration? I was able to get nvidia-smi running directly with k3s ctr on docker.io/nvidia/cuda:11.4.0-base-ubuntu20.04 with a much hackier version of @eadwu’s config, but had no success with actually launching pods in k3s. Now with both my config and @eadwu’s, nvidia-container-cli -k -d /dev/tty info finds the libraries in /tmp/ld.so.cache, but ctr crashes with the same, rather unhelpful message on the host and in k3s:

ctr: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: exec failed: no such file or directory: unknown

No hint as to what the missing file could be in /var/log/nvidia-container-runtime.log

2023/05/16 17:47:07 Using bundle directory: /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/cuda
2023/05/16 17:47:07 Using OCI specification file path: /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/cuda/config.json
2023/05/16 17:47:07 Looking for runtime binary 'docker-runc'
2023/05/16 17:47:07 Runtime binary 'docker-runc' not found: exec: "docker-runc": executable file not found in $PATH
2023/05/16 17:47:07 Looking for runtime binary 'runc'
2023/05/16 17:47:07 Found runtime binary '/var/lib/rancher/k3s/data/7f541dce8a067205b29b8bfd0f332683637eca4a9921987c4ec09c59fdd5d695/bin/runc'
2023/05/16 17:47:07 Running /nix/store/kr3460f2i13yg5c06gqchb8kj9yxg3bg-nvidia-k3s/bin/nvidia-container-runtime

2023/05/16 17:47:07 No modification required
2023/05/16 17:47:07 Forwarding command to runtime

Thank you!

eadwu · May 17, 2023, 1:03am

That looks like a error from an old version. I believe mine should work 0.12.1.

wokalski · May 17, 2023, 5:30am

I stopped toying with this (I will get back to it soon). However when I had this error, it meant that some binaries from the nvidia-docker package were not in path.

wokalski · May 19, 2023, 6:19am

Did you have any luck with this?

edit: Have you considered building a containerized nvidia driver?

rusty-jules · May 19, 2023, 11:02pm

Unfortunately no, I have now tried with versions 1.9.0, 1.13.1, and 1.12.1.

nvidia-smi works with a k3s ctr run --gpus 0, but the nvidia-container-runtime binary is now (in v1.12.1 and v1.13.1) failing to load libcuda, which results in a missing symbol error.

k3s ctr run --rm -t --gpus 0 --runc-binary=nvidia-container-runtime docker.io/nvidia/cuda:11.4.0-base-ubuntu20.04 cuda

ctr: failed to create shim task: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/k3s/containerd/io.containerd.run
time.v2.task/k8s.io/cuda23/log.json: no such file or directory): nvidia-container-runtime did not terminate successfully: exit status 127: /
nix/store/qm28zv7kyl60pxhf8xyp33c1m5dr6jzz-nvidia-k3s/bin/nvidia-container-runtime: symbol lookup error: /nix/store/qm28zv7kyl60pxhf8xyp33c1
m5dr6jzz-nvidia-k3s/bin/nvidia-container-runtime: undefined symbol: cuDriverGetVersion: unknown

Very puzzling, since libcuda is in the /tmp/ld.so.cache, and 1.9.0 was not having this issue.

ldconfig -C /tmp/ld.so.cache --print-cache | grep cuda
        libcudadebugger.so.1 (libc6,x86-64) => /tmp/nvidia-libs/libcudadebugger.so.1
        libcudadebugger.so (libc6,x86-64) => /tmp/nvidia-libs/libcudadebugger.so
        libcuda.so.1 (libc6,x86-64) => /tmp/nvidia-libs/libcuda.so.1
        libcuda.so (libc6,x86-64) => /tmp/nvidia-libs/libcuda.so

And nvidia-container-cli is having no issues

nvidia-container-cli -k -d log
cat log | grep libcuda

I0519 22:42:56.134093 3346551 nvc_info.c:174] selecting /nix/store/30x7mhkxv6ghf8893d6lhd5jiplxh897-nvidia-x11-525.89.02-5.15.96/lib/libcuda
debugger.so.525.89.02
I0519 22:42:56.134210 3346551 nvc_info.c:174] selecting /nix/store/30x7mhkxv6ghf8893d6lhd5jiplxh897-nvidia-x11-525.89.02-5.15.96/lib/libcuda
.so.525.89.02
W0519 22:42:56.134862 3346551 nvc_info.c:404] missing compat32 library libcuda.so
W0519 22:42:56.134869 3346551 nvc_info.c:404] missing compat32 library libcudadebugger.so

But there is no load of libcuda occurring

$ LD_DEBUG=libs nvidia-container-runtime 2>&1 | grep "find library"
   3375337:     find library=libdl.so.2 [0]; searching
   3375337:     find library=libc.so.6 [0]; searching
   3375344:     find library=libdl.so.2 [0]; searching
   3375344:     find library=libc.so.6 [0]; searching
   3375337:     find library=libdl.so.2 [0]; searching
   3375337:     find library=libpthread.so.0 [0]; searching
   3375337:     find library=libc.so.6 [0]; searching

$ strace nvidia-container-runtime 2>&1 | rg 'openat\(.*, "/nix/store/(.*)",.*' -r '$1'
76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224/lib/glibc-hwcaps/x86-64-v3/libdl.so.2
76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224/lib/glibc-hwcaps/x86-64-v2/libdl.so.2
76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224/lib/tls/haswell/x86_64/libdl.so.2
76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224/lib/tls/haswell/libdl.so.2
76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224/lib/tls/x86_64/libdl.so.2
76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224/lib/tls/libdl.so.2
76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224/lib/haswell/x86_64/libdl.so.2
76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224/lib/haswell/libdl.so.2
76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224/lib/x86_64/libdl.so.2
76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224/lib/libdl.so.2
76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224/lib/libc.so.6
76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224/lib/gconv/gconv-modules.cache
76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224/lib/gconv/gconv-modules
76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224/lib/gconv/gconv-modules.d
76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224/lib/gconv/gconv-modules.d/gconv-modules-extra.conf
4y4jdqg9s8sw4f56n7lqy59azi8lgp5z-container-toolkit-container-toolkit-1.12.1/lib/glibc-hwcaps/x86-64-v3/libdl.so.2
4y4jdqg9s8sw4f56n7lqy59azi8lgp5z-container-toolkit-container-toolkit-1.12.1/lib/glibc-hwcaps/x86-64-v2/libdl.so.2
4y4jdqg9s8sw4f56n7lqy59azi8lgp5z-container-toolkit-container-toolkit-1.12.1/lib/tls/haswell/x86_64/libdl.so.2
4y4jdqg9s8sw4f56n7lqy59azi8lgp5z-container-toolkit-container-toolkit-1.12.1/lib/tls/haswell/libdl.so.2
4y4jdqg9s8sw4f56n7lqy59azi8lgp5z-container-toolkit-container-toolkit-1.12.1/lib/tls/x86_64/libdl.so.2
4y4jdqg9s8sw4f56n7lqy59azi8lgp5z-container-toolkit-container-toolkit-1.12.1/lib/tls/libdl.so.2
4y4jdqg9s8sw4f56n7lqy59azi8lgp5z-container-toolkit-container-toolkit-1.12.1/lib/haswell/x86_64/libdl.so.2
4y4jdqg9s8sw4f56n7lqy59azi8lgp5z-container-toolkit-container-toolkit-1.12.1/lib/haswell/libdl.so.2
4y4jdqg9s8sw4f56n7lqy59azi8lgp5z-container-toolkit-container-toolkit-1.12.1/lib/x86_64/libdl.so.2
4y4jdqg9s8sw4f56n7lqy59azi8lgp5z-container-toolkit-container-toolkit-1.12.1/lib/libdl.so.2
76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224/lib/glibc-hwcaps/x86-64-v3/libdl.so.2
76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224/lib/glibc-hwcaps/x86-64-v2/libdl.so.2
76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224/lib/tls/haswell/x86_64/libdl.so.2
76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224/lib/tls/haswell/libdl.so.2
76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224/lib/tls/x86_64/libdl.so.2
76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224/lib/tls/libdl.so.2
76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224/lib/haswell/x86_64/libdl.so.2
76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224/lib/haswell/libdl.so.2
76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224/lib/x86_64/libdl.so.2
76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224/lib/libdl.so.2
76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224/lib/libpthread.so.0
76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224/lib/libc.so.6

This is with the patch on nvidia-container-toolkit

preBuild = ''
  substituteInPlace go/src/github.com/NVIDIA/nvidia-container-toolkit/internal/config/config.go \
    --replace '/usr/bin' '${placeholder "out"}/bin'

  sed -i -e "s@/etc/ld.so.cache@/tmp/ld.so.cache@" -e "s@/etc/ld.so.conf@/tmp/ld.so.conf@" \
      go/src/github.com/NVIDIA/nvidia-container-toolkit/internal/ldcache/ldcache.go \
      go/src/github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/hook/update-ldcache/update-ldcache.go \
'';

Thanks for the tip on nvidia’s containerized driver solution. I was considering using kata containers but that would be a pivot away from nixos which is what I’m trying to avoid

rusty-jules · May 20, 2023, 8:33am

Ok so I took the very un-nix hammer approach and it worked! I have the current nixpkgs-unstable nvidia-container-toolkit derivation in my overlay and statically linked libcuda and libnvidia-ml into the go binaries. This is with v1.12.1.

  ldflags = [ "-s" "-w" "-extldflags" "'-L${unpatched-nvidia-driver}/lib -lcuda -lnvidia-ml'" ];

Where unpatched-nvidia-driver is @eadwu’s builder swap, though I’m not sure if that matters in this case since these libs don’t link to any other nvidia libs, and the ones they do link should be in whatever container is running.

ldd /nix/store/30x7mhkxv6ghf8893d6lhd5jiplxh897-nvidia-x11-525.89.02-5.15.96/lib/libcuda.so.525.89.02
        linux-vdso.so.1 (0x00007ffd5fc4e000)
        libm.so.6 => /nix/store/xnk2z26fqy86xahiz3q797dzqx96sidk-glibc-2.37-8/lib/libm.so.6 (0x00007f3405733000)
        libc.so.6 => /nix/store/xnk2z26fqy86xahiz3q797dzqx96sidk-glibc-2.37-8/lib/libc.so.6 (0x00007f340554d000)
        libdl.so.2 => /nix/store/xnk2z26fqy86xahiz3q797dzqx96sidk-glibc-2.37-8/lib/libdl.so.2 (0x00007f3405548000)
        libpthread.so.0 => /nix/store/xnk2z26fqy86xahiz3q797dzqx96sidk-glibc-2.37-8/lib/libpthread.so.0 (0x00007f3405543000)
        librt.so.1 => /nix/store/xnk2z26fqy86xahiz3q797dzqx96sidk-glibc-2.37-8/lib/librt.so.1 (0x00007f340553e000)
        /nix/store/xnk2z26fqy86xahiz3q797dzqx96sidk-glibc-2.37-8/lib64/ld-linux-x86-64.so.2 (0x00007f34074f5000)

ldd /nix/store/30x7mhkxv6ghf8893d6lhd5jiplxh897-nvidia-x11-525.89.02-5.15.96/lib/libnvidia-ml.so.1
        linux-vdso.so.1 (0x00007ffdde5fd000)
        libpthread.so.0 => /nix/store/xnk2z26fqy86xahiz3q797dzqx96sidk-glibc-2.37-8/lib/libpthread.so.0 (0x00007faa7a5b8000)
        libm.so.6 => /nix/store/xnk2z26fqy86xahiz3q797dzqx96sidk-glibc-2.37-8/lib/libm.so.6 (0x00007faa79520000)
        libdl.so.2 => /nix/store/xnk2z26fqy86xahiz3q797dzqx96sidk-glibc-2.37-8/lib/libdl.so.2 (0x00007faa7a5b3000)
        libc.so.6 => /nix/store/xnk2z26fqy86xahiz3q797dzqx96sidk-glibc-2.37-8/lib/libc.so.6 (0x00007faa7933a000)
        /nix/store/xnk2z26fqy86xahiz3q797dzqx96sidk-glibc-2.37-8/lib64/ld-linux-x86-64.so.2 (0x00007faa7a5bf000)

Furthermore, I had twiddle with the /var/lib/rancher/k3s/agent/etc/containerd/config.toml.tmpl quite a bit and landed on this

[plugins.opt]
  path = "{{ .NodeConfig.Containerd.Opt }}"

[plugins.cri]
  stream_server_address = "127.0.0.1"
  stream_server_port = "10010"

  # ---- added for gpu
  enable_selinux = {{ .NodeConfig.SELinux }}
  enable_unprivileged_ports = true
  enable_unprivileged_icmp = true
  # end added for gpu

{{- if .IsRunningInUserNS }}
  disable_cgroup = true
  disable_apparmor = true
  restrict_oom_score_adj = true
{{end}}

{{- if .NodeConfig.AgentConfig.PauseImage }}
  sandbox_image = "{{ .NodeConfig.AgentConfig.PauseImage }}"
{{end}}

{{- if not .NodeConfig.NoFlannel }}
[plugins.cri.cni]
  bin_dir = "{{ .NodeConfig.AgentConfig.CNIBinDir }}"
  conf_dir = "{{ .NodeConfig.AgentConfig.CNIConfDir }}"
{{end}}

[plugins.cri.containerd]
	default_runtime_name = "runc"

	# ---- added for GPU support
        # https://github.com/k3s-io/k3s/issues/4391#issuecomment-1202986597
	snapshotter = "overlayfs"
	disable_snapshot_annotations = true

[plugins.cri.containerd.runtimes.runc]
  runtime_type = "io.containerd.runc.v2"

# ---- added for GPU support
[plugins.cri.containerd.runtimes.nvidia]
	runtime_type = "io.containerd.runc.v2"
	runtime_root = ""
	runtime_engine = ""
	privileged_without_host_devices = false
[plugins.cri.containerd.runtimes.nvidia.options]
	BinaryName = "@nvidia-container-runtime@"
	SystemdCgroup = true

{{ if .PrivateRegistryConfig }}
{{ if .PrivateRegistryConfig.Mirrors }}
[plugins.cri.registry.mirrors]{{end}}
{{range $k, $v := .PrivateRegistryConfig.Mirrors }}
[plugins.cri.registry.mirrors."{{$k}}"]
  endpoint = [{{range $i, $j := $v.Endpoints}}{{if $i}}, {{end}}{{printf "%q" .}}{{end}}]
{{end}}

{{range $k, $v := .PrivateRegistryConfig.Configs }}
{{ if $v.Auth }}
[plugins.cri.registry.configs."{{$k}}".auth]
  {{ if $v.Auth.Username }}username = "{{ $v.Auth.Username }}"{{end}}
  {{ if $v.Auth.Password }}password = "{{ $v.Auth.Password }}"{{end}}
  {{ if $v.Auth.Auth }}auth = "{{ $v.Auth.Auth }}"{{end}}
  {{ if $v.Auth.IdentityToken }}identitytoken = "{{ $v.Auth.IdentityToken }}"{{end}}
{{end}}
{{ if $v.TLS }}
[plugins.cri.registry.configs."{{$k}}".tls]
  {{ if $v.TLS.CAFile }}ca_file = "{{ $v.TLS.CAFile }}"{{end}}
  {{ if $v.TLS.CertFile }}cert_file = "{{ $v.TLS.CertFile }}"{{end}}
  {{ if $v.TLS.KeyFile }}key_file = "{{ $v.TLS.KeyFile }}"{{end}}
{{end}}
{{end}}
{{end}}

Wherein “@nvidia-container-runtime@” is substituted at rebuild with the full /nix/store path.

Now I’m at a stage where the nvidia-device-plugin is running stable and show gpus on the node, but only libcuda.so.525.89.02 is being mounted into pod containers with the runtimeClassName: nvidia instead of libcuda.so.1 which is what nvidia-smi and friends are looking for. Looking to patch that with some kind of admission webhook policy since one run of ldconfig in the container adds the libcuda.so.1 symlink and gets it all working, as opposed to reversing the static library linking and crossing my fingers that nvidia-container-runtime mounts appropriately.

Thanks for the help!

rusty-jules · May 21, 2023, 9:47pm

My final solution to the ldconfig run issue inside the container was another patch of libnvidia-container/src/nvc_ldcache.c, essentially running ldconfig again but with the /etc/ld.so.conf that is mounted into the containers containing the paths expected by the nvidia-container-runtime.

$ kubectl exec -n kube-system -it nvidia-device-plugin-pz56s -- /bin/sh
$ cat /etc/ld.so.conf
include /etc/ld.so.conf.d/*.conf
$ cat /etc/ld.so.conf.d/*.conf
# libc default configuration
/usr/local/lib
/usr/local/nvidia/lib
/usr/local/nvidia/lib64
# Multiarch support
/usr/local/lib/x86_64-linux-gnu
/lib/x86_64-linux-gnu
/usr/lib/x86_64-linux-gnu

This patch works on top of @eadwu’s patch.
libnvc-ldcache-container-again.patch

diff --git a/src/nvc_ldcache.c b/src/nvc_ldcache.c
index db3b2f69..28e08d3b 100644
--- a/src/nvc_ldcache.c
+++ b/src/nvc_ldcache.c
@@ -356,6 +356,7 @@ int
 nvc_ldcache_update(struct nvc_context *ctx, const struct nvc_container *cnt)
 {
         char **argv;
+        char **argv_container;
         pid_t child;
         int status;
         bool drop_groups = true;
@@ -402,11 +403,18 @@ nvc_ldcache_update(struct nvc_context *ctx, const struct nvc_container *cnt)
                 if (limit_syscalls(&ctx->err) < 0)
                         goto fail;
 
+                argv_container = (char * []){argv[0], "-f", "/etc/ld.so.conf", "-C", "/etc/ld.so.cache", cnt->cfg.libs_dir, cnt->cfg.libs32_dir, NULL};
                 if (fd < 0)
                         execve(argv[0], argv, (char * const []){NULL});
                 else
                         fexecve(fd, argv, (char * const []){NULL});
                 error_set(&ctx->err, "process execution failed");
+                log_infof("executing %s again", argv_container[0]);
+                if (fd < 0)
+                        execve(argv_container[0], argv_container, (char * const []){NULL});
+                else
+                        fexecve(fd, argv_container, (char * const []){NULL});
+                error_set(&ctx->err, "process execution failed");
          fail:
                 log_errf("could not start %s: %s", argv[0], ctx->err.msg);
                 (ctx->err.code == ENOENT) ? _exit(EXIT_SUCCESS) : _exit(EXIT_FAILURE);

libnvidia-container.nix

  patches = [
    # eadwu's patch
    ./libnvidia-container-ldcache.patch
    # patch from above
    ./libnvc-ldcache-container-again.patch
    # patch from nixpkgs
    ./inline-nvcgo-struct.patch
  ];

Edit: the admission webhook solution did not work because

initContainers can only operate on files in a shared volumeMount which could work in this case but complicates matter significantly since you’d have to mount all of the library volumes and /etc subpaths containing ld.so.{cache,config}
lifeCycle: postStart: exec: command: ["ldconfig"] just invites a race condition that fails often

BriianPowell · September 7, 2023, 4:52am

Was wondering if anyone has a working example of this? Currently trying to integrate my Nvidia GPU with k3s + containerd and I’m hitting a wall, was hoping to find some guidance

covert8 · November 1, 2023, 7:39pm

My problem (like an idiot) was linuxPackages_hardend

gbea · November 6, 2023, 5:09pm

Can someone post a working config? I am struggling to make it work.

rusty-jules · November 8, 2023, 10:05am

I’ve received various requests over the past months for a working config, so I’ve decided to make the repo I have running this config public.

Please note the nixpkgs flake lock versions, as something may have changed and I have not needed to update the base config in some time.

Good luck!

SergeK · November 9, 2023, 12:00am

Thanks! I’m not familiar with k3s, or runc, or nvidia-container-runtime… may I ask again, why do you have to to link nvidia_x11 directly? Is it not an option to just take the userspace drivers from /run/opengl-driver/lib?