Improving NixOS data science infrastructure: CI for MKL&CUDA

If you can get Cachix working, thatā€™s how hercules agent gets derivations and outputs, nothing goes through our central server.

If you need compute resources, we have a 16 cores build box in the nix-community project. It would be nice to see it running with more CPU utilization :wink:

thatā€™d be amazing! Iā€™ll shoot you a DM

Any progress? Iā€™ve basically given up on compiling pytorch with CUDA locally, just isnā€™t time-feasible without leaving it on overnight. The base expression takes <40min but with just enabling CUDA support alone I was only at ~63% after 3.5 hours.

1 Like

Luckily, our machines have plenty of cores, so it does not take that long. But it is long enough to be annoying when we move up nixpkgs and something in PyTorchā€™s closure is updated. So, instead I have started to just use upstream binaries and patchelf them.

I know itā€™s not so nice as source builds, but ā€˜buildsā€™ finish in seconds. Still it would be nice to have a (semi-)official binary cache for source builds.

(I primarily use libtorch, so this is only a derivation for that, but the same could be done for the Python module.)

4 Likes

Last time I heard, stites was working on a Hydra for the GPU libs.

For now you can use his GitHub - stites/pytorch-world: nix scripts for pytorch-related libraries repo with the cachix binary cache.

@stites : how close are you from a working automated binary cache for PyTorch ? :slight_smile:

1 Like

Thatā€™s great :slight_smile: I think it would be useful to say why itā€™s better/worse than officially recommended conda installation.

Also missing ā€œGetting startedā€ in the README for those unfamiliar with Nix.

Made quite a bit of progressā€“we build these jobsets against MKL and CUDA: GitHub - nix-community/nix-data-science: Standard set of packages and overlays for data-scientists [maintainer=@tbenst]. The build results are available here: https://hydra.nix-data.org/project/nix-data.

So if you use the pinned nixpkgs on 20.03 and the same overlay, should at least have some guarantees that the long build will succeed.

For caching, in practice we just need to integrate with Cachix to upload the binaries and weā€™ll be good-to-go.

The only thing holding us back is uncertainty around the licensing situation. I just sent nvidia another email. As pointed out by @xbreak, I think it is reasonable to conclude that we do not currently modify the object code of the binaries, but rather are modifying the library metadata.

3 Likes

That is so sweet to hear! That will definitely have a huge positive impact :slightly_smiling_face:

Thanks so much for your work

Not sure that those are going to be required anymore because of recent updates (see tbenst answer)

Thanks for the effort! Are you planning to add an overlay for R with MKL instead of OpenBLAS? We are trying to create one (or update R in nixpkgs to have an option to use MKL). MKL is the only thing that keeps my team from abandoning MRO. Microsoft seem to have lost interest in R and MRO is stuck at version 3.5.3.

Great idea! We currently are building the tidyverse and a few other R packages.

Care to make a pull request adding an R overlay? If not, Iā€™ll get around to it eventually.

Right now itā€™s just two jobs (one to build an R environment and one to build RStudio), but Iā€™ve been meaning do separate jobs for each R package

Patching the binaries wasnā€™t as bad as I thought. Not sure if everything was patched but CUDA support is distributed with the binary from pypi.
The closure can probably be reduced but for my purposes it works and is far faster than attempting to compile it from source.

Nix expressions:
https://paste.sr.ht/~eadwu/3559ec6647fbe79e57b4b0b9b67ddd0d9130ffae

In case anyone comes across this, Iā€™m not sure how much of a strict dependency this is, but it seems to prefers CUDA 10.1 (or at least one of the executables has a link to a CUDA 10.1 library)

    cudnn = pkgs.cudnn_cudatoolkit_10_1;
    cudatoolkit = pkgs.cudatoolkit_10_1;

They are now including CUDA in the prebuilt binaries, which makes it even easier to package. Only downside is that libtorch_cuda.so is now a 709MiB binary ;).

1 Like

We now have a derivation python3Packages.pytorch-bin with CUDA support:

https://github.com/NixOS/nixpkgs/pull/96669

Should help those who want to avoid the heavy build of python3Packages.pytorch. I also did a PR for libtorch-bin for the C++ API (which is also used by e.g. the Rust tch) crate, so hopefully weā€™ll have that soon as well:

https://github.com/NixOS/nixpkgs/pull/96488

10 Likes

Forgot to add: the upstream builds use MKL as their BLAS library. This should generally give better performance than multi-threaded OpenBLAS, which we use by default as the system-wide BLAS and is thus used by python3Packages.pytorch by default. Multi-threaded OpenBLAS also does not work correctly if your application uses any kind of threading.

Unfortunately, on a AMD Ryzen CPUs, MKL will use slower SSE kernels. You can force the use of AVX2 kernels with the MKL version that libtorch/PyTorch use, with export MKL_DEBUG_CPU_TYPE=5.

Iā€™d assume this is probably where the most people affected would reside.

If youā€™re on 5.9, youā€™ll need to circumvent the GPL-condom to use nvidia-uvm for CUDA. Spent more time then Iā€™d like debugging the wrong places.

1 Like

Thanks for the heads-up! I was wondering why I was getting CUDA error: unknown error errors. Some straceing revealed that /dev/nvidia-uvm could not be opened. Manual modprobeing showed an error that reminded me of your comment.

Itā€™s annoying to run into these Linux ā†” NVIDIA licensing shenanigans when you are just trying to get work done :frowning: .

1 Like

I havenā€™t tested it, but there is this patch https://github.com/Frogging-Family/nvidia-all/blob/f1d3c6cf024945e7a477ed306bd173fa6b81d72d/patches/kernel-5.9.patch

Officially, we need to wait a month to Nvidia fix it NVIDIA Doesn't Expect To Have Linux 5.9 Driver Support For Another Month - Phoronix

Luckily NixOS makes it so easy to switch kernels :slight_smile: , so itā€™s not a real problem to stick to a slightly older kernel.

1 Like