If you can get Cachix working, thatās how hercules agent gets derivations and outputs, nothing goes through our central server.
If you need compute resources, we have a 16 cores build box in the nix-community project. It would be nice to see it running with more CPU utilization
thatād be amazing! Iāll shoot you a DM
Any progress? Iāve basically given up on compiling pytorch with CUDA locally, just isnāt time-feasible without leaving it on overnight. The base expression takes <40min but with just enabling CUDA support alone I was only at ~63% after 3.5 hours.
Luckily, our machines have plenty of cores, so it does not take that long. But it is long enough to be annoying when we move up nixpkgs
and something in PyTorchās closure is updated. So, instead I have started to just use upstream binaries and patchelf them.
I know itās not so nice as source builds, but ābuildsā finish in seconds. Still it would be nice to have a (semi-)official binary cache for source builds.
(I primarily use libtorch
, so this is only a derivation for that, but the same could be done for the Python module.)
Last time I heard, stites was working on a Hydra for the GPU libs.
For now you can use his GitHub - stites/pytorch-world: nix scripts for pytorch-related libraries repo with the cachix binary cache.
@stites : how close are you from a working automated binary cache for PyTorch ?
Thatās great I think it would be useful to say why itās better/worse than officially recommended conda installation.
Also missing āGetting startedā in the README for those unfamiliar with Nix.
Made quite a bit of progressāwe build these jobsets against MKL and CUDA: GitHub - nix-community/nix-data-science: Standard set of packages and overlays for data-scientists [maintainer=@tbenst]. The build results are available here: https://hydra.nix-data.org/project/nix-data.
So if you use the pinned nixpkgs on 20.03 and the same overlay, should at least have some guarantees that the long build will succeed.
For caching, in practice we just need to integrate with Cachix to upload the binaries and weāll be good-to-go.
The only thing holding us back is uncertainty around the licensing situation. I just sent nvidia another email. As pointed out by @xbreak, I think it is reasonable to conclude that we do not currently modify the object code of the binaries, but rather are modifying the library metadata.
That is so sweet to hear! That will definitely have a huge positive impact
Thanks so much for your work
Not sure that those are going to be required anymore because of recent updates (see tbenst answer)
Thanks for the effort! Are you planning to add an overlay for R with MKL instead of OpenBLAS? We are trying to create one (or update R in nixpkgs to have an option to use MKL). MKL is the only thing that keeps my team from abandoning MRO. Microsoft seem to have lost interest in R and MRO is stuck at version 3.5.3.
Great idea! We currently are building the tidyverse and a few other R packages.
Care to make a pull request adding an R overlay? If not, Iāll get around to it eventually.
Right now itās just two jobs (one to build an R environment and one to build RStudio), but Iāve been meaning do separate jobs for each R package
Patching the binaries wasnāt as bad as I thought. Not sure if everything was patched but CUDA support is distributed with the binary from pypi.
The closure can probably be reduced but for my purposes it works and is far faster than attempting to compile it from source.
Nix expressions:
https://paste.sr.ht/~eadwu/3559ec6647fbe79e57b4b0b9b67ddd0d9130ffae
In case anyone comes across this, Iām not sure how much of a strict dependency this is, but it seems to prefers CUDA 10.1 (or at least one of the executables has a link to a CUDA 10.1 library)
cudnn = pkgs.cudnn_cudatoolkit_10_1;
cudatoolkit = pkgs.cudatoolkit_10_1;
They are now including CUDA in the prebuilt binaries, which makes it even easier to package. Only downside is that libtorch_cuda.so
is now a 709MiB binary ;).
We now have a derivation python3Packages.pytorch-bin
with CUDA support:
https://github.com/NixOS/nixpkgs/pull/96669
Should help those who want to avoid the heavy build of python3Packages.pytorch
. I also did a PR for libtorch-bin
for the C++ API (which is also used by e.g. the Rust tch
) crate, so hopefully weāll have that soon as well:
Forgot to add: the upstream builds use MKL as their BLAS library. This should generally give better performance than multi-threaded OpenBLAS, which we use by default as the system-wide BLAS and is thus used by python3Packages.pytorch
by default. Multi-threaded OpenBLAS also does not work correctly if your application uses any kind of threading.
Unfortunately, on a AMD Ryzen CPUs, MKL will use slower SSE kernels. You can force the use of AVX2 kernels with the MKL version that libtorch/PyTorch use, with export MKL_DEBUG_CPU_TYPE=5
.
Iād assume this is probably where the most people affected would reside.
If youāre on 5.9, youāll need to circumvent the GPL-condom to use nvidia-uvm for CUDA. Spent more time then Iād like debugging the wrong places.
Thanks for the heads-up! I was wondering why I was getting CUDA error: unknown error
errors. Some strace
ing revealed that /dev/nvidia-uvm
could not be opened. Manual modprobe
ing showed an error that reminded me of your comment.
Itās annoying to run into these Linux ā NVIDIA licensing shenanigans when you are just trying to get work done .
I havenāt tested it, but there is this patch https://github.com/Frogging-Family/nvidia-all/blob/f1d3c6cf024945e7a477ed306bd173fa6b81d72d/patches/kernel-5.9.patch
Officially, we need to wait a month to Nvidia fix it NVIDIA Doesn't Expect To Have Linux 5.9 Driver Support For Another Month - Phoronix
Luckily NixOS makes it so easy to switch kernels , so itās not a real problem to stick to a slightly older kernel.