Python dll import

Hello,

On a non-nixos Linux system, the python installed via nix is not able to load a DLL, whereas the system python can.

More precisely,

galepage in 🌐 alya in ~ 
✦ ❮ find /usr -name libnvidia-ml.so.1
/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1
/usr/lib/i386-linux-gnu/libnvidia-ml.so.1

galepage in 🌐 alya in ~ 
✦ ❮ /usr/bin/python -c "from ctypes import CDLL; CDLL('libnvidia-ml.so.1')"

galepage in 🌐 alya in ~ 
✦ ❮ /home/galepage/.nix-profile/bin/python -c "from ctypes import CDLL; CDLL('libnvidia-ml.so.1')"               
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/nix/store/9srs642k875z3qdk8glapjycncf2pa51-python3-3.10.7/lib/python3.10/ctypes/__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libnvidia-ml.so.1: cannot open shared object file: No such file or directory

It seems that both pythons are not looking at the same locations on the system…
Do you know why this happens ?
The CDLL constructor is relying on the system’s dlopen function so I don’t get why they are not behaving the same.

1 Like

This make sense: to increase reproducibility, Nix does not rely on /lib: everything must be in /nix/store/hash…. As explained here, CDLL uses dlopen to get the library, which itself uses various folders to search the library, including the rpath of the loader, LD_PRELOAD and the LD_LIBRARY_PATH variable (not really recommended to use this if possible)…

I don’t have time to try, but my guess it that you should install the library inside the in python3.withPackages (ps: with ps; [ … ]). If your library is not packaged already, something like that may work:

let myLib = stdenv.mkDerivation {
  src = ./.; # folder containing the .so files
  nativeBuildInputs = [
    pkgs.autoPatchelfHook
  ];
  installPhase = ''
    mkdir -p $out/lib
    cp libnvidia-ml.so.1 $out/lib
  '';
};
in python3.withPackages (ps: with ps; [ myLib ])
2 Likes

Thank you for your reply.

Ok, this makes sense.

To give a bit more context, I am trying to package nvitop which, through the nvidia-ml-py python library loads the DLL libnvidia-ml.so.1.
The thing is that the latter is provided by the Nvidia driver directly.
Hence, on a non-NixOS system, I guess that nvidia-ml-py should be able to load the system library where it is already available.

In this setting, from what you say, changing $LD_LIBRARY_PATH seems to be the only way…
What do you think ?

I confirm that the following is working:

galepage in 🌐 alya in ~ 
✦ ❯ LD_LIBRARY_PATH=/lib/x86_64-linux-gnu /home/galepage/.nix-profile/bin/python -c "from ctypes import CDLL; CDLL('libnvidia-ml.so.1')"

Arg… this may be working for a quick and really dirty solution but it can break in thousands of ways (like python will surely also pick other libs from this folder)… And it will not be portable to other OS that put the library somewhere else.

For a clean solution, I’m not an nvidia expert (and I don’t even have nvidia card to test) but I would first double check if this library can’t be packaged individually outside of a per-os driver (debian does have a .deb for that so it may even work if you just extract the deb and copy the lib folder in your derivation). If not, then you may get inspired by cudatoolkit that does something similar for CUDA. I’m not even sure how it works.

1 Like

Ok, you are right.

I looked at the nccl package which also relies on libnvidia-ml.so.1.

This lead me to discover addOpenGLRunPath.

Do you think that it could be a good solution for this problem ?
Is it also suited to non-NixOS Linux platforms ?

Thank you once again for your help !

The addOpenGLRunPath hook patches ELF binaries’ headers to teach the dynamic loader to look libraries up in the /run/opengl-driver/lib, unless overridden by LD_LIBRARY_PATH. You have hacked more or less similar behaviour in nvidia-ml-py by looking up libnvidia-ml.so at that exact (absolute) location. When the user sets LD_LIBRARY_PATH before running python, the first CDLL call (CDLL("libnvidia-ml.so.1"), with just the base name) succeeds and the NixOS-specific one is never executed

Setting LD_LIBRARY_PATH=/lib (or similar) is “dangerous” in that python will try to load all of its shared libraries from /lib (which are uncontrolled random revisions) rather than exact versions from /nix/store. You might want to point LD_LIBRARY_PATH to a separate directory with symlinks to individual libraries (like libnvidia-ml.so.1 or libcuda.so)

Also note that python is going to crash if the libraries in /lib were built against some other version of libc, than used in your revision of nixpkgs. In this case you probably can’t really avoid using GitHub - guibou/nixGL: A wrapper tool for nix OpenGL application

1 Like

Yes, this is unsupported and if it would work like this it would be considered impure and patched out/removed again.

1 Like

@SergeK So how are packages like blender (that needs CUDA) dealing with this case for non-nixos systems? As far as I know they don’t ask you to create a separate folder and symlink your /lib/ in this folder and override LD_LIBRARY_PATH.

I thought that cudatoolkit was somehow providing a “fake” library that would check if /run/opengl-driver/lib exists and otherwise would try to see if something exists in /lib… but I guess my understanding is wrong.

1 Like

Indeed, I would need a little bit of help to properly handle this at the nvidia-ml-py package level.

Maybe what they do with nccl can help.

I suspect that people use nixGL? :thinking:

Hosted by Flying Circus.