Improving NixOS data science infrastructure: CI for MKL&CUDA

timokau · December 14, 2019, 9:21am

An alternative would be to have every PR go through staging. We could make an exception for security-critical ones of course.

Then we’d merge staging-next if and only if hydra is all-green. That way everybody who wants to get something into master is motivated to monitor the staging-next status. We should encourage people to use reverts as a first measure to fix things, since otherwise it would incentivize people to just push to staging and let other people deal with the fallout.

Maybe we could even get hydra to do git-bisect somewhat automatically. That would be a lot cheaper than building every PR.

So the flow would then be: I want to update a package. I open a PR against staging, do the usual quality checks that are common today (does it still work? do some reverse dependencies build?) and then get it merged.

At some point, staging gets promoted to staging-next. I see that there are several breakages in staging-next, none of which are caused by my update. Its pretty obvious that failure 1 was caused by PR X, so I open a new PR to revert those changes and ping the author of the original PR. Some other failures are not quite as obvious, so I (or someone else) run a git-bisect. Eventually everything builds, staging-next gets merged, the next staging gets promoted.

Arsleust · December 16, 2019, 5:51pm

Noob question here, is it possible to install tensorflow with GPU from PyPi on NixOS ?

danieldk · December 16, 2019, 7:45pm

Not out of the box, since the dynamic linker/library paths will be incorrect for NixOS. However, the tensorflow-bin derivation uses the PyPI package and patches up the library dependencies:

github.com

NixOS/nixpkgs/blob/3ad650a14b0477a0df2795abe185c66849a5012d/pkgs/development/python-modules/tensorflow/bin.nix

{ stdenv
, lib
, fetchurl
, buildPythonPackage
, isPy3k, pythonOlder, isPy38
, astor
, gast
, google-pasta
, wrapt
, numpy
, six
, termcolor
, protobuf
, absl-py
, grpcio
, mock
, backports_weakref
, tensorflow-estimator
, tensorflow-tensorboard
, cudaSupport ? false

This file has been truncated. show original

tbenst · December 23, 2019, 8:40am

I now have hydra running thanks to the help of many on github and irc. Here’s the NixOps deploy: GitHub - tbenst/nix-data-hydra: Hydra deployment for Nix Data. Next challenge is to figure out how to only distribute items that are some form of unfreeRedistributable.

Here are two approaches that come to mind:

Fork NIxpkgs and patch nixpkgs/lib/licenses.nix to manually remove free = false for licenses that can be redistributed, e.g. unfreeRedistributable, issl, nvidia_cuda, and nvidia_cudnn [1].
add new attribute to licenses called e.g. redistributable, and create a system-wide allowRedistributable flag.

The former we can do on our own. The latter is imho the better solution but I’m not sure how to go about that, both in terms of the code base as well as politically with maintainers–not sure if this thought would be well-received

[1] We need to update the license for Nvidia to be more precise than unfree. I made a pull request here.

Edit: realized the manual has a nice section on this. Third option is to handle with a well-crafted overlay: Nixpkgs 23.11 manual | Nix & NixOS

tbenst · January 2, 2020, 5:21am

Quick update: my hydra is broken as no jobs de-queue due to hydra-queue-runner gets stuck while there are items in the queue · Issue #366 · NixOS/hydra · GitHub. If anyone can help troubleshoot let me know, happy to give ssh access!

Also, if anyone has bandwidth to create a new nvidia derivation that aims to be redistributable, that would be awesome. I presume the build could be modified to only copy these specific files to $out. My understanding is the output should only include the following files (per my reading of the license / this is what Anaconda distributes).

cuDNN:

cudnn.h
libcudnn.so

CUDA:

lib/libcublas.so
lib/libcublasLt.so
lib/libcudart.so
lib/libcufft.so
lib/libcufftw.so
lib/libcurand.so
lib/libcusolver.so
lib/libcusparse.so
lib/libnppc.so
lib/libnppial.so
lib/libnppicc.so
lib/libnppicom.so
lib/libnppidei.so
lib/libnppif.so
lib/libnppig.so
lib/libnppim.so
lib/libnppist.so
lib/libnppisu.so
lib/libnppitc.so
lib/libnpps.so
lib/libnvToolsExt.so
lib/libnvblas.so
lib/libnvgraph.so
lib/libnvjpeg.so
lib/libnvrtc-builtins.so
lib/libnvrtc.so
lib/libnvvm.so

domenkozar · January 2, 2020, 11:45am

My offer to help you get hercules-agent running still stands

tbenst · January 6, 2020, 4:20am

Thanks! I finally think I understand the name…Hercules, slayer of Hydra .

@tomberek and I were able to get hydra running, although there are some definite pain points in nix with large files like cuda.run (3GB), or in hydra with large derivations like PyTorch (12GB! Had to disable store-uri compression). Also took a fair bit of effort to figure out distributed builds—didn’t realize that we needed an ssh key for hydra-queue-runner.

Would love to chat with you about cachix though, I’ll drop you a DM

domenkozar · January 6, 2020, 9:12am

If you can get Cachix working, that’s how hercules agent gets derivations and outputs, nothing goes through our central server.

zimbatm · January 6, 2020, 10:27am

If you need compute resources, we have a 16 cores build box in the nix-community project. It would be nice to see it running with more CPU utilization

tbenst · January 6, 2020, 6:00pm

that’d be amazing! I’ll shoot you a DM

eadwu · March 5, 2020, 3:47am

Any progress? I’ve basically given up on compiling pytorch with CUDA locally, just isn’t time-feasible without leaving it on overnight. The base expression takes <40min but with just enabling CUDA support alone I was only at ~63% after 3.5 hours.

danieldk · March 5, 2020, 7:33am

Luckily, our machines have plenty of cores, so it does not take that long. But it is long enough to be annoying when we move up nixpkgs and something in PyTorch’s closure is updated. So, instead I have started to just use upstream binaries and patchelf them.

github.com

danieldk/nix-packages/blob/af06ec7127d7c7e5b9fa9a6516570b13923ad720/pkgs/libtorch/default.nix

{ callPackage, nvidia_x11 }:

{
  v1_3_1 = callPackage ./generic.nix {
    inherit nvidia_x11;

    libtorchVersion = "1.3.1";
    libtorchArchives = {
      x86_64-darwin-cpu = {
        url = "https://download.pytorch.org/libtorch/cpu/libtorch-macos-1.3.1.zip";
        sha256 = "0picyiywgqa1l8qcvij381d6z0lbpn864br2rzsw1g8g6rg6invg";
      };
      x86_64-linux-gpu = {
        url = "https://download.pytorch.org/libtorch/cu101/libtorch-cxx11-abi-shared-with-deps-1.3.1.zip";
        sha256 = "1rxpgwi88pi42g8881nlyqdhmzjl51iax7hv93x4wd99iwdd1yva";
      };
      x86_64-linux-cpu = {
        url = "https://download.pytorch.org/libtorch/cpu/libtorch-cxx11-abi-shared-with-deps-1.3.1%2Bcpu.zip";
        sha256 = "1v76l70wsyal8d41w00mnhg9ykd647z1a5h545x02wfi1d9cmanw";
      };

This file has been truncated. show original

github.com

danieldk/nix-packages/blob/af06ec7127d7c7e5b9fa9a6516570b13923ad720/pkgs/libtorch/generic.nix

{ config
, stdenv
, fetchzip
, lib

, autoPatchelfHook
, fixDarwinDylibNames

, cudaSupport ? config.cudaSupport or false
, nvidia_x11

, libtorchArchives
, libtorchVersion
}:

let
  device = if cudaSupport then "gpu" else "cpu";
  src = libtorchArchives."${stdenv.targetPlatform.system}-${device}" or unavailable;
  unavailable = throw "libtorch is not available for this platform";
in stdenv.mkDerivation {

This file has been truncated. show original

I know it’s not so nice as source builds, but ‘builds’ finish in seconds. Still it would be nice to have a (semi-)official binary cache for source builds.

(I primarily use libtorch, so this is only a derivation for that, but the same could be done for the Python module.)

Arsleust · March 5, 2020, 10:32am

Last time I heard, stites was working on a Hydra for the GPU libs.

For now you can use his GitHub - stites/pytorch-world: nix scripts for pytorch-related libraries repo with the cachix binary cache.

@stites : how close are you from a working automated binary cache for PyTorch ?

domenkozar · March 5, 2020, 10:38am

That’s great I think it would be useful to say why it’s better/worse than officially recommended conda installation.

Also missing “Getting started” in the README for those unfamiliar with Nix.

tbenst · March 5, 2020, 4:54pm

Made quite a bit of progress–we build these jobsets against MKL and CUDA: GitHub - nix-community/nix-data-science: Standard set of packages and overlays for data-scientists [maintainer=@tbenst]. The build results are available here: https://hydra.nix-data.org/project/nix-data.

So if you use the pinned nixpkgs on 20.03 and the same overlay, should at least have some guarantees that the long build will succeed.

For caching, in practice we just need to integrate with Cachix to upload the binaries and we’ll be good-to-go.

The only thing holding us back is uncertainty around the licensing situation. I just sent nvidia another email. As pointed out by @xbreak, I think it is reasonable to conclude that we do not currently modify the object code of the binaries, but rather are modifying the library metadata.

Arsleust · March 5, 2020, 5:25pm

That is so sweet to hear! That will definitely have a huge positive impact

Thanks so much for your work

Arsleust · March 5, 2020, 5:26pm

Not sure that those are going to be required anymore because of recent updates (see tbenst answer)

alexv · March 5, 2020, 5:34pm

Thanks for the effort! Are you planning to add an overlay for R with MKL instead of OpenBLAS? We are trying to create one (or update R in nixpkgs to have an option to use MKL). MKL is the only thing that keeps my team from abandoning MRO. Microsoft seem to have lost interest in R and MRO is stuck at version 3.5.3.

tbenst · March 5, 2020, 8:36pm

Great idea! We currently are building the tidyverse and a few other R packages.

Care to make a pull request adding an R overlay? If not, I’ll get around to it eventually.

Right now it’s just two jobs (one to build an R environment and one to build RStudio), but I’ve been meaning do separate jobs for each R package

eadwu · March 5, 2020, 10:24pm

Patching the binaries wasn’t as bad as I thought. Not sure if everything was patched but CUDA support is distributed with the binary from pypi.
The closure can probably be reduced but for my purposes it works and is far faster than attempting to compile it from source.

Nix expressions:
https://paste.sr.ht/~eadwu/3559ec6647fbe79e57b4b0b9b67ddd0d9130ffae

In case anyone comes across this, I’m not sure how much of a strict dependency this is, but it seems to prefers CUDA 10.1 (or at least one of the executables has a link to a CUDA 10.1 library)

    cudnn = pkgs.cudnn_cudatoolkit_10_1;
    cudatoolkit = pkgs.cudatoolkit_10_1;