Using CUDA-enabled packages on non-NixOS systems?

samuela · February 19, 2022, 10:28pm

How does one use NVIDIA/CUDA-enabled packages on non-NixOS systems? Due to How to use NVIDIA V100/A100 GPUs?, I’ve had to give up on NixOS and instead have resorted to an ubuntu system with NVIDIA drivers. However, packages like tensorflowWithCuda/jaxlibWithCuda look for cudatoolkit and driver libraries in /var/run/opengl-driver/lib instead of whatever the location is on the non-NixOS OS (on ubuntu it appears to be /usr/lib/x86_64-linux-gnu/libcuda.so.1). I tried linking libcuda.so.1 and friends into /var/run/opengl-driver/lib but that only gets me new problems: RuntimeError: UNKNOWN: PTX JIT compiler library not found · Issue #9644 · google/jax · GitHub.

So how does this work? How does one use CUDA-enabled software in nixpkgs on non-NixOS machines?

markuskowa · February 20, 2022, 11:07am

I was successful with NixGL on CentOS and A100 GPUs.

See also:

SergeK · February 20, 2022, 11:25am

Rather random questions, but:

Do you only symlink libraries that come from cudatoolkit, or do you also link things from the nvidia x11 driver (like libnvidia-ml.so or something)
Have you tried running with LD_DEBUG=libs and observing which libraries jax is trying to search for at runtime?
Is there ptxas in PATH?

samuela · February 20, 2022, 11:59pm

I haven’t found success with nixGL:

[nix-shell:~/dev/research/lottery]$ nixGLNvidia-510.47.03 python cifar10_convnet_run.py --test
tests took 1.25754 seconds
Could not load library libcudnn_ops_infer.so.8. Error: libcublas.so.11: cannot open shared object file: No such file or directory
Please make sure libcudnn_ops_infer.so.8 is in your library path!
/home/ubuntu/.nix-profile/bin/nixGLNvidia-510.47.03: line 6: 11823 Aborted                 (core dumped) "$@"

Am I doing something wrong?

samuela · February 21, 2022, 12:14am

These are good questions!

I was symlinking in everything, including libnvidia-*.so. I believe everything was present but ofc anything’s possible since it wasn’t working.
I wasn’t aware of LD_DEBUG=libs, that’s very handy! It looks like it’s trying and failing to find libcublas.so.11 in a few places. (full logs here) IIRC libcublas.so.11 is in cudatoolkit, so I’m not sure why that’s failing… EDIT: yes, it lives at /nix/store/9lv0wxqkbqw2438wrhllcyf3sx644i5z-cudatoolkit-11.5.0/lib/libcublas.so.11
Yes, ptxas is in PATH.

samuela · February 21, 2022, 12:33am

Related thread:

samuela · February 22, 2022, 12:01am

For future reference, my solution is to add the following in my shell.nix:

  shellHook = ''
    export LD_LIBRARY_PATH=${pkgs.cudatoolkit_11_5}/lib
  '';

and then run things with nixGL:

$ nixGLNvidia-510.47.03 python myscript.py

samuela · March 15, 2022, 1:02am

This should be fixed (no need to export LD_LIBRARY_PATH) now that cudnn: init cudnn_8_3 at 8.3.0 by samuela · Pull Request #158218 · NixOS/nixpkgs · GitHub has landed.