How does one use NVIDIA/CUDA-enabled packages on non-NixOS systems? Due to How to use NVIDIA V100/A100 GPUs?, I’ve had to give up on NixOS and instead have resorted to an ubuntu system with NVIDIA drivers. However, packages like tensorflowWithCuda/jaxlibWithCuda look for cudatoolkit and driver libraries in /var/run/opengl-driver/lib instead of whatever the location is on the non-NixOS OS (on ubuntu it appears to be /usr/lib/x86_64-linux-gnu/libcuda.so.1). I tried linking libcuda.so.1 and friends into /var/run/opengl-driver/lib but that only gets me new problems: RuntimeError: UNKNOWN: PTX JIT compiler library not found · Issue #9644 · google/jax · GitHub.
So how does this work? How does one use CUDA-enabled software in nixpkgs on non-NixOS machines?
[nix-shell:~/dev/research/lottery]$ nixGLNvidia-510.47.03 python cifar10_convnet_run.py --test
tests took 1.25754 seconds
Could not load library libcudnn_ops_infer.so.8. Error: libcublas.so.11: cannot open shared object file: No such file or directory
Please make sure libcudnn_ops_infer.so.8 is in your library path!
/home/ubuntu/.nix-profile/bin/nixGLNvidia-510.47.03: line 6: 11823 Aborted (core dumped) "$@"
I was symlinking in everything, including libnvidia-*.so. I believe everything was present but ofc anything’s possible since it wasn’t working.
I wasn’t aware of LD_DEBUG=libs, that’s very handy! It looks like it’s trying and failing to find libcublas.so.11 in a few places. (full logs here) IIRC libcublas.so.11 is in cudatoolkit, so I’m not sure why that’s failing… EDIT: yes, it lives at /nix/store/9lv0wxqkbqw2438wrhllcyf3sx644i5z-cudatoolkit-11.5.0/lib/libcublas.so.11