I’m hoping to manage data science environments on a nvidia GPU equipped system running ubuntu with nix, a task I’ve been doing with conda up to now. Specifically I’m trying to get torch.cuda.is_available() to report true. In troubleshooting this issue I’ve identified the following issues:
For nixpkgs CUDA support is enabled in a few ad hoc ways, from a global config options cudaSupport to override keys passed to individual packages to the use of distinguished packages like torchWithCuda.
Binaries compiled with CUDA support require the use of a community cache, which I’ve done, to avoid compiling e.g. torch.
Being on a non-nixOS system the graphics driver is not part of the nix environment which leads to the use of tools like nixGL and finnagling e.g. LD_LIBRARY_PATH to include system openGL paths and any nix-installed CUDA toolchains.
Most advice I’ve seen boils down to declaring a shell more or less like this one, but torch.cuda.is_available() isn’t what I need it to be no matter what I try it seems.
I guess I’ve got two questions:
Is the flake linked above not really the right recipe and should I be trying something else? The point was to use something idiomatic and share the approach with some colleagues, so using nix to deploy venvs doesn’t qualify; I can already do that without nix. Also, I don’t really know what the underlying issue even is or how to begin troubleshooting it.
Is this generally just a waste of time? Recent discussions are short on praise with experienced users admitting that nix is an obstacle more often than not and that they just use conda or venvs for ad hoc package use. nixGL seems to be abandoned (and doesn’t solve my problem), and I haven’t seen more recent initiatives to address the issue of running graphics accelerated applications on non-nixOS systems.
Nix on NixOS is fine, but nix on other distros historically required some schemes like nixGL which just doesn’t work well. Allegedly there are some fixes coming that make nixgl unnecessary:
But I’ve not tried this in years, so I can’t comment on the recent changes.
Other options are discussed elsewhere:
Best of luck. You might also want to post your actual code if you want feedback on it.
Your article was very helpful, thank you for writing and sharing it. I guess there’s no substitute (in this case) for understanding the underlying dynamic linking picture.
I was under the impression that the GL driver was correctly linked in the shell definition I linked, but of course, nothing is present at /run/opengl-driver on non-nixOS, which I might’ve checked beforehand. Setting LD_PRELOAD as your article suggests seems to have solved the problem.
I’m wondering why nixGL didn’t do this for me automatically. I’ll post a flake for posterity when I settle on one.