Using nvidia-container-runtime with containerd on NixOS

I’m sorry but I don’t think I understand the question. We’re patching (or rather unpatching the patchelf step) nvidia_x11 pkg so that the drivers work in the FHS containers in k3s which won’t have the /nix/store in them. The drivers need to be discovered twice via ldconfig - once by the nvidia container runtime on the host for mounting into the container, and once by the container itself. How would the userspace drivers inherit that property?

I can decompose the question into smaller ones:

  1. How do you decide, at build time, which nvidia_x11 to take? Because the container must mount the userspace drivers (including libcuda.so and libnvidia-ml.so) compatible (=precisely the same, or at least newer) with the kernel module used by the host machine at runtime

  2. … which won’t have the /nix/store in them

    But the drivers are still mounted impurely from the host, why not mount the /nix/store/xxxx...-nvidia_x11&c paths as well

  3. The drivers need to be discovered twice via ldconfig

    This happens for some packages. If possible, the desirable way to approach this is to remove/make optional all of the references to ld.so.cache, and replace the “path inference” results with the predictable values. For a container running on a NixOS host, this path would be /run/opengl-driver/lib, plus targets of the symlinks therein. The drivers deployed there by NixOS in that location are known to be compatible with the kernel used at runtime

I’m asking because I think we should want to integrate the solution into nixpkgs. Thanks!

Ok! We’re bumping up against the edge of my knowledge here, but if I understand correctly:

  1. How do you decide, at build time, which nvidia_x11 to take?

    I assume you’re talking about this line here:

     unpatched-nvidia-driver = (super.pkgs.linuxKernel.packages.linux_5_15.nvidia_x11_production.overrideAttrs (oldAttrs: {
    builder = ../overlays/nvidia-builder.sh;
     }));
    

    And indeed I chose that through trial and error. Which package is the standard one to install that will be guaranteed to be compatible with the kernel?

  2. But the drivers are still mounted impurely from the host

    This is done by the nvidia-container-runtime. Where is chooses to mount the drivers in the container – regardless of how it finds them – is a bit deeper in the codebase than I looked. The discovery mechanism is via ldconfig, but currently we are pointing it at the drivers in /nix/store/xxx...-nvidia-x11 on the host here. Are you asking why /nix/store/xxx-nvidia-x11 is not mounted directly (at the same path) in the container?

  3. the desirable way to approach this is to remove/make optional all of the references to ld.so.cache , and replace the “path inference” results with the predictable values

    I believe this would require a different patch of libnvidia-container so that it searches /run/opengl-driver/lib instead of relying on ldconfig while on the host. I’m not sure how deep that would go, but it may be possible. The search for ld.so.cache is hardcoded into the library here, a path altered by this patch and in common.h by this patch. I think this would fundamentally change libnvidia-container’s discovery mechanism. The difficulty there is that this code seems to run once on the host and once in the container, so getting it to use different discovery mechanisms depending on the context would take more considerable understanding of the library, esp. since containers will not adhere to the NixOS paths. Alternatively, getting libnvidia-container to mount the NixOS paths, then update the container’s own ld.so.cache to reference those paths is a different challenge. For example, I’m not sure if the following search paths provided by ld.so.conf.d in the nvidia-device-plugin container are baked into the container, or altered by libnvidia-container at runtime.

    $ kubectl exec -n kube-system -it nvidia-device-plugin-pz56s -- /bin/sh
    $ cat /etc/ld.so.conf
    include /etc/ld.so.conf.d/*.conf
    $ cat /etc/ld.so.conf.d/*.conf
    # libc default configuration
    /usr/local/lib
    /usr/local/nvidia/lib
    /usr/local/nvidia/lib64
    # Multiarch support
    /usr/local/lib/x86_64-linux-gnu
    /lib/x86_64-linux-gnu
    /usr/lib/x86_64-linux-gnu
    

Would love to see a pure integration in nixpkgs! My feeling is that some deeper C skills than I possess would be necessary here.

rusty-jules has pretty much said everything but I’ll add what I can remember. I’ve since abandoned my NVIDIA integration in k3s since there were a host of other issues.

libnvidia-container has a list of files it looks for https://github.com/NVIDIA/libnvidia-container/blob/1eb5a30a6ad0415550a9df632ac8832bf7e2bbba/src/nvc_info.c#L51

Which then finds the library paths using the ldcache https://github.com/NVIDIA/libnvidia-container/blob/1eb5a30a6ad0415550a9df632ac8832bf7e2bbba/src/nvc_info.c#L220

Ideally, we’d be using the NVIDIA/CUDA driver pod from NVIDIA, but my attempts have failed due to layout of NixOS (don’t quite remember errors, but I’d imagine the classic problems with trying to use imperative programs).

2/3.
If you were to try to make it NixOS-friendly, a better alternative it write it from scratch. In comparison in making due with the ldcache approach, it is a lot more prone to needing active maintenance from upstream changes.

fwiw using the cdi integration with containerd seems much better. I have a working setup, though I’m still using the nvidia ldcache shenanigans to generate the cdi spec via nvidia-ctk cdi generate, but this is patchable and/or writable from scratch. For what it’s worth it seems to work more or less out of the box, though I had to manually patch the output (with jq) to mount /run/opengl-driver into the sandbox to get a well-known LD_LIBRARY_PATH to work with, but the autogenerated spec (essentially nvidia-ctk cdi generate used on an ldconfig dump of /run/opengl-driver) actually managed to at least resolve every symlink into the store along with the relevant devices on its own.

We definitely need to reach out with the upstream and discuss a way for them to (1) stop relying on the obscure glibc internals such as /etc/ld.so.cache, and (2) stop assuming that mounting the host system’s drivers is the right thing to do (generally speaking, it’s not, because the container image might come with a different libc).

The first step would be to introduce the static configuration option allowing the user to explicitly list the paths to the drivers, at build time or in a config file read at runtime. In principle, the /etc/ld.so.cache already is that, except writing one affects more than just the ctk.

I’m planning to open an issue/PR eventually but it’ll probably take me ages, and I’ll be really happy if somebody goes ahead before then

I’ll also add that this doesn’t only affect k3s, but also for example apptainer/singularity which are important for the HPC stuff