Unfortunately no, I have now tried with versions 1.9.0, 1.13.1, and 1.12.1.
nvidia-smi
works with a k3s ctr run --gpus 0
, but the nvidia-container-runtime
binary is now (in v1.12.1 and v1.13.1) failing to load libcuda
, which results in a missing symbol error.
k3s ctr run --rm -t --gpus 0 --runc-binary=nvidia-container-runtime docker.io/nvidia/cuda:11.4.0-base-ubuntu20.04 cuda
ctr: failed to create shim task: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/k3s/containerd/io.containerd.run
time.v2.task/k8s.io/cuda23/log.json: no such file or directory): nvidia-container-runtime did not terminate successfully: exit status 127: /
nix/store/qm28zv7kyl60pxhf8xyp33c1m5dr6jzz-nvidia-k3s/bin/nvidia-container-runtime: symbol lookup error: /nix/store/qm28zv7kyl60pxhf8xyp33c1
m5dr6jzz-nvidia-k3s/bin/nvidia-container-runtime: undefined symbol: cuDriverGetVersion: unknown
Very puzzling, since libcuda
is in the /tmp/ld.so.cache
, and 1.9.0
was not having this issue.
ldconfig -C /tmp/ld.so.cache --print-cache | grep cuda
libcudadebugger.so.1 (libc6,x86-64) => /tmp/nvidia-libs/libcudadebugger.so.1
libcudadebugger.so (libc6,x86-64) => /tmp/nvidia-libs/libcudadebugger.so
libcuda.so.1 (libc6,x86-64) => /tmp/nvidia-libs/libcuda.so.1
libcuda.so (libc6,x86-64) => /tmp/nvidia-libs/libcuda.so
And nvidia-container-cli
is having no issues
nvidia-container-cli -k -d log
cat log | grep libcuda
I0519 22:42:56.134093 3346551 nvc_info.c:174] selecting /nix/store/30x7mhkxv6ghf8893d6lhd5jiplxh897-nvidia-x11-525.89.02-5.15.96/lib/libcuda
debugger.so.525.89.02
I0519 22:42:56.134210 3346551 nvc_info.c:174] selecting /nix/store/30x7mhkxv6ghf8893d6lhd5jiplxh897-nvidia-x11-525.89.02-5.15.96/lib/libcuda
.so.525.89.02
W0519 22:42:56.134862 3346551 nvc_info.c:404] missing compat32 library libcuda.so
W0519 22:42:56.134869 3346551 nvc_info.c:404] missing compat32 library libcudadebugger.so
But there is no load of libcuda
occurring
$ LD_DEBUG=libs nvidia-container-runtime 2>&1 | grep "find library"
3375337: find library=libdl.so.2 [0]; searching
3375337: find library=libc.so.6 [0]; searching
3375344: find library=libdl.so.2 [0]; searching
3375344: find library=libc.so.6 [0]; searching
3375337: find library=libdl.so.2 [0]; searching
3375337: find library=libpthread.so.0 [0]; searching
3375337: find library=libc.so.6 [0]; searching
$ strace nvidia-container-runtime 2>&1 | rg 'openat\(.*, "/nix/store/(.*)",.*' -r '$1'
76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224/lib/glibc-hwcaps/x86-64-v3/libdl.so.2
76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224/lib/glibc-hwcaps/x86-64-v2/libdl.so.2
76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224/lib/tls/haswell/x86_64/libdl.so.2
76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224/lib/tls/haswell/libdl.so.2
76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224/lib/tls/x86_64/libdl.so.2
76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224/lib/tls/libdl.so.2
76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224/lib/haswell/x86_64/libdl.so.2
76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224/lib/haswell/libdl.so.2
76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224/lib/x86_64/libdl.so.2
76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224/lib/libdl.so.2
76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224/lib/libc.so.6
76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224/lib/gconv/gconv-modules.cache
76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224/lib/gconv/gconv-modules
76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224/lib/gconv/gconv-modules.d
76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224/lib/gconv/gconv-modules.d/gconv-modules-extra.conf
4y4jdqg9s8sw4f56n7lqy59azi8lgp5z-container-toolkit-container-toolkit-1.12.1/lib/glibc-hwcaps/x86-64-v3/libdl.so.2
4y4jdqg9s8sw4f56n7lqy59azi8lgp5z-container-toolkit-container-toolkit-1.12.1/lib/glibc-hwcaps/x86-64-v2/libdl.so.2
4y4jdqg9s8sw4f56n7lqy59azi8lgp5z-container-toolkit-container-toolkit-1.12.1/lib/tls/haswell/x86_64/libdl.so.2
4y4jdqg9s8sw4f56n7lqy59azi8lgp5z-container-toolkit-container-toolkit-1.12.1/lib/tls/haswell/libdl.so.2
4y4jdqg9s8sw4f56n7lqy59azi8lgp5z-container-toolkit-container-toolkit-1.12.1/lib/tls/x86_64/libdl.so.2
4y4jdqg9s8sw4f56n7lqy59azi8lgp5z-container-toolkit-container-toolkit-1.12.1/lib/tls/libdl.so.2
4y4jdqg9s8sw4f56n7lqy59azi8lgp5z-container-toolkit-container-toolkit-1.12.1/lib/haswell/x86_64/libdl.so.2
4y4jdqg9s8sw4f56n7lqy59azi8lgp5z-container-toolkit-container-toolkit-1.12.1/lib/haswell/libdl.so.2
4y4jdqg9s8sw4f56n7lqy59azi8lgp5z-container-toolkit-container-toolkit-1.12.1/lib/x86_64/libdl.so.2
4y4jdqg9s8sw4f56n7lqy59azi8lgp5z-container-toolkit-container-toolkit-1.12.1/lib/libdl.so.2
76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224/lib/glibc-hwcaps/x86-64-v3/libdl.so.2
76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224/lib/glibc-hwcaps/x86-64-v2/libdl.so.2
76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224/lib/tls/haswell/x86_64/libdl.so.2
76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224/lib/tls/haswell/libdl.so.2
76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224/lib/tls/x86_64/libdl.so.2
76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224/lib/tls/libdl.so.2
76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224/lib/haswell/x86_64/libdl.so.2
76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224/lib/haswell/libdl.so.2
76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224/lib/x86_64/libdl.so.2
76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224/lib/libdl.so.2
76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224/lib/libpthread.so.0
76l4v99sk83ylfwkz8wmwrm4s8h73rhd-glibc-2.35-224/lib/libc.so.6
This is with the patch on nvidia-container-toolkit
preBuild = ''
substituteInPlace go/src/github.com/NVIDIA/nvidia-container-toolkit/internal/config/config.go \
--replace '/usr/bin' '${placeholder "out"}/bin'
sed -i -e "s@/etc/ld.so.cache@/tmp/ld.so.cache@" -e "s@/etc/ld.so.conf@/tmp/ld.so.conf@" \
go/src/github.com/NVIDIA/nvidia-container-toolkit/internal/ldcache/ldcache.go \
go/src/github.com/NVIDIA/nvidia-container-toolkit/cmd/nvidia-ctk/hook/update-ldcache/update-ldcache.go \
'';
Thanks for the tip on nvidia’s containerized driver solution. I was considering using kata containers but that would be a pivot away from nixos which is what I’m trying to avoid 