Any progress? I’ve basically given up on compiling pytorch with CUDA locally, just isn’t time-feasible without leaving it on overnight. The base expression takes <40min but with just enabling CUDA support alone I was only at ~63% after 3.5 hours.
Luckily, our machines have plenty of cores, so it does not take that long. But it is long enough to be annoying when we move up nixpkgs
and something in PyTorch’s closure is updated. So, instead I have started to just use upstream binaries and patchelf them.
I know it’s not so nice as source builds, but ‘builds’ finish in seconds. Still it would be nice to have a (semi-)official binary cache for source builds.
(I primarily use libtorch
, so this is only a derivation for that, but the same could be done for the Python module.)
Last time I heard, stites was working on a Hydra for the GPU libs.
For now you can use his GitHub - stites/pytorch-world: nix scripts for pytorch-related libraries repo with the cachix binary cache.
@stites : how close are you from a working automated binary cache for PyTorch ?
That’s great I think it would be useful to say why it’s better/worse than officially recommended conda installation.
Also missing “Getting started” in the README for those unfamiliar with Nix.
Made quite a bit of progress–we build these jobsets against MKL and CUDA: GitHub - nix-community/nix-data-science: Standard set of packages and overlays for data-scientists [maintainer=@tbenst]. The build results are available here: https://hydra.nix-data.org/project/nix-data.
So if you use the pinned nixpkgs on 20.03 and the same overlay, should at least have some guarantees that the long build will succeed.
For caching, in practice we just need to integrate with Cachix to upload the binaries and we’ll be good-to-go.
The only thing holding us back is uncertainty around the licensing situation. I just sent nvidia another email. As pointed out by @xbreak, I think it is reasonable to conclude that we do not currently modify the object code of the binaries, but rather are modifying the library metadata.
That is so sweet to hear! That will definitely have a huge positive impact
Thanks so much for your work
Not sure that those are going to be required anymore because of recent updates (see tbenst answer)
Thanks for the effort! Are you planning to add an overlay for R with MKL instead of OpenBLAS? We are trying to create one (or update R in nixpkgs to have an option to use MKL). MKL is the only thing that keeps my team from abandoning MRO. Microsoft seem to have lost interest in R and MRO is stuck at version 3.5.3.
Great idea! We currently are building the tidyverse and a few other R packages.
Care to make a pull request adding an R overlay? If not, I’ll get around to it eventually.
Right now it’s just two jobs (one to build an R environment and one to build RStudio), but I’ve been meaning do separate jobs for each R package
Patching the binaries wasn’t as bad as I thought. Not sure if everything was patched but CUDA support is distributed with the binary from pypi.
The closure can probably be reduced but for my purposes it works and is far faster than attempting to compile it from source.
Nix expressions:
https://paste.sr.ht/~eadwu/3559ec6647fbe79e57b4b0b9b67ddd0d9130ffae
In case anyone comes across this, I’m not sure how much of a strict dependency this is, but it seems to prefers CUDA 10.1 (or at least one of the executables has a link to a CUDA 10.1 library)
cudnn = pkgs.cudnn_cudatoolkit_10_1;
cudatoolkit = pkgs.cudatoolkit_10_1;
They are now including CUDA in the prebuilt binaries, which makes it even easier to package. Only downside is that libtorch_cuda.so
is now a 709MiB binary ;).
We now have a derivation python3Packages.pytorch-bin
with CUDA support:
https://github.com/NixOS/nixpkgs/pull/96669
Should help those who want to avoid the heavy build of python3Packages.pytorch
. I also did a PR for libtorch-bin
for the C++ API (which is also used by e.g. the Rust tch
) crate, so hopefully we’ll have that soon as well:
Forgot to add: the upstream builds use MKL as their BLAS library. This should generally give better performance than multi-threaded OpenBLAS, which we use by default as the system-wide BLAS and is thus used by python3Packages.pytorch
by default. Multi-threaded OpenBLAS also does not work correctly if your application uses any kind of threading.
Unfortunately, on a AMD Ryzen CPUs, MKL will use slower SSE kernels. You can force the use of AVX2 kernels with the MKL version that libtorch/PyTorch use, with export MKL_DEBUG_CPU_TYPE=5
.
I’d assume this is probably where the most people affected would reside.
If you’re on 5.9, you’ll need to circumvent the GPL-condom to use nvidia-uvm for CUDA. Spent more time then I’d like debugging the wrong places.
Thanks for the heads-up! I was wondering why I was getting CUDA error: unknown error
errors. Some strace
ing revealed that /dev/nvidia-uvm
could not be opened. Manual modprobe
ing showed an error that reminded me of your comment.
It’s annoying to run into these Linux ↔ NVIDIA licensing shenanigans when you are just trying to get work done .
I haven’t tested it, but there is this patch https://github.com/Frogging-Family/nvidia-all/blob/f1d3c6cf024945e7a477ed306bd173fa6b81d72d/patches/kernel-5.9.patch
Officially, we need to wait a month to Nvidia fix it NVIDIA Doesn't Expect To Have Linux 5.9 Driver Support For Another Month - Phoronix
Luckily NixOS makes it so easy to switch kernels , so it’s not a real problem to stick to a slightly older kernel.
I have tried to use the new BLAS/LAPACK infrastructure to build R with MKL and the resulting R produces incorrect results for matrix multiplication (see my comment Add BLAS/LAPACK switching mechanism by matthewbauer · Pull Request #83888 · NixOS/nixpkgs · GitHub). I have opened an issue here R built with MKL computes incorrect results · Issue #104026 · NixOS/nixpkgs · GitHub which has the code I used to build R.
It is. I have switched to 5.9 a while ago (maybe 2 weeks?) and have been using CUDA a lot with Torch.