Installing pytorch 1.0rc1


#1

I’m trying to install pytorch 1.0rc1, as I need pytorch to run on cuda 10, and pytorch 0.41 doesn’t build against cuda 10. I tried just changing rev and version, and building in a python3.6 environment (changed lines pasted below)

in buildPythonPackage rec {
  #version = "0.4.1";
  version = "1.0rc1";
  pname = "pytorch-1.0rc1";

  src = fetchFromGitHub {
    owner  = "pytorch";
    repo   = "pytorch";
    rev    = "v${version}";
    fetchSubmodules = true;
    #sha256 = "1cr8h47jxgfar5bamyvlayvqymnb2qvp7rr0ka2d2d4rdldf9lrp";
    sha256 = "1rd4l52bj2dcczrw50zkgalg4hlim7y87hv0rcm4a4a6d1vhc0lf";
  };

I’ve copied the pytorch from the newest git checkout of unstable, and am just using import, as below:

mypytorch =  import ./pytorch.nix {
   inherit fetchFromGitHub linkFarm utillinux symlinkJoin lib;
   cudaSupport=true;
   buildPythonPackage=python36Packages.buildPythonPackage;
   pythonOlder = python36Packages.pythonOlder;
   numpy = python36Packages.numpy;
   pyyaml = python36Packages.pyyaml;
   cffi = python36Packages.cffi;
   typing = python36Packages.typing;
   cmake = pkgs.cmake;
   which = pkgs.which;
   cudatoolkit = pkgs.cudatoolkit_10; #mycudatoolkit10;
   };

It fails with the RPATH contains a forbidden reference to /build, similar to this bug

stripping (with command strip and flags -S) in /nix/store/gcayaq662v8pfp4y05al7kv6gczb7fy7-python3.6-pytorch-1.0rc1-1.0rc1/lib  /nix/store/gcayaq662v8pfp4y05al7kv6gczb7fy7-python3.6-pytorch-1.0rc1-1.0rc1/bin
patching script interpreter paths in /nix/store/gcayaq662v8pfp4y05al7kv6gczb7fy7-python3.6-pytorch-1.0rc1-1.0rc1
checking for references to /build in /nix/store/gcayaq662v8pfp4y05al7kv6gczb7fy7-python3.6-pytorch-1.0rc1-1.0rc1...
RPATH of binary /nix/store/gcayaq662v8pfp4y05al7kv6gczb7fy7-python3.6-pytorch-1.0rc1-1.0rc1/lib/python3.6/site-packages/torch/lib/libtorch.so contains a forbidden reference to /build
builder for '/nix/store/l7nmcwmbflybadr9ibiarx68crg16fk0-python3.6-pytorch-1.0rc1-1.0rc1.drv' failed with exit code 1
cannot build derivation '/nix/store/mw9yicbpicfgnj8g76cpjqbc7iyp9gbc-python3-3.6.6-env.drv': 1 dependencies couldn't be built
cannot build derivation '/nix/store/i2cnf08f55pqm9nzf3316pdf35yfi144-eyeserver.drv': 1 dependencies couldn't be built
error: build of '/nix/store/i2cnf08f55pqm9nzf3316pdf35yfi144-eyeserver.drv' failed

ldding the offending binary yields

[henry@watson:~/Projects/eyeserver/nixshell]$ ldd /nix/store/gcayaq662v8pfp4y05al7kv6gczb7fy7-python3.6-pytorch-1.0rc1-1.0rc1/lib/python3.6/site-packages/torch/lib/libtorch.so
	linux-vdso.so.1 (0x00007ffc82977000)
	libnvToolsExt.so.1 => /nix/store/irgsnzls1sxiz55csna3k80rbvvjik5a-cudatoolkit-10.0.130-unsplit/lib/libnvToolsExt.so.1 (0x00007f0acbd77000)
	libdl.so.2 => /nix/store/g2yk54hifqlsjiha3szr4q3ccmdzyrdv-glibc-2.27/lib/libdl.so.2 (0x00007f0acbb73000)
	librt.so.1 => /nix/store/g2yk54hifqlsjiha3szr4q3ccmdzyrdv-glibc-2.27/lib/librt.so.1 (0x00007f0acb96b000)
	libcaffe2.so => /nix/store/gcayaq662v8pfp4y05al7kv6gczb7fy7-python3.6-pytorch-1.0rc1-1.0rc1/lib/python3.6/site-packages/torch/lib/libcaffe2.so (0x00007f0ac96c8000)
	libcaffe2_gpu.so => /nix/store/gcayaq662v8pfp4y05al7kv6gczb7fy7-python3.6-pytorch-1.0rc1-1.0rc1/lib/python3.6/site-packages/torch/lib/libcaffe2_gpu.so (0x00007f0a731a5000)
	libc10.so => /nix/store/gcayaq662v8pfp4y05al7kv6gczb7fy7-python3.6-pytorch-1.0rc1-1.0rc1/lib/python3.6/site-packages/torch/lib/libc10.so (0x00007f0a72fa2000)
	libpthread.so.0 => /nix/store/g2yk54hifqlsjiha3szr4q3ccmdzyrdv-glibc-2.27/lib/libpthread.so.0 (0x00007f0a72d83000)
	libstdc++.so.6 => /nix/store/x7bvplbn6ss80ngdcn753i9nvrlvym5r-gcc-7.3.0-lib/lib/libstdc++.so.6 (0x00007f0a729fc000)
	libm.so.6 => /nix/store/g2yk54hifqlsjiha3szr4q3ccmdzyrdv-glibc-2.27/lib/libm.so.6 (0x00007f0a72667000)
	libgcc_s.so.1 => /nix/store/g2yk54hifqlsjiha3szr4q3ccmdzyrdv-glibc-2.27/lib/libgcc_s.so.1 (0x00007f0a72451000)
	libc.so.6 => /nix/store/g2yk54hifqlsjiha3szr4q3ccmdzyrdv-glibc-2.27/lib/libc.so.6 (0x00007f0a7209d000)
	/nix/store/g2yk54hifqlsjiha3szr4q3ccmdzyrdv-glibc-2.27/lib64/ld-linux-x86-64.so.2 (0x00007f0accb62000)
	libnvrtc.so.10.0 => /nix/store/irgsnzls1sxiz55csna3k80rbvvjik5a-cudatoolkit-10.0.130-unsplit/lib/libnvrtc.so.10.0 (0x00007f0a70a7f000)
	libopenblas.so.0 => /nix/store/5wr8s95jcs1mvqkhlgj9dwwkijxr2vy2-openblas-0.3.3/lib/libopenblas.so.0 (0x00007f0a6ed22000)
	libcudart.so.10.0 => /nix/store/irgsnzls1sxiz55csna3k80rbvvjik5a-cudatoolkit-10.0.130-unsplit/lib/libcudart.so.10.0 (0x00007f0a6eaa5000)
	libcuda.so.1 => /run/opengl-driver/lib/libcuda.so.1 (0x00007f0a6d9a1000)
	libgomp.so.1 => /nix/store/x7bvplbn6ss80ngdcn753i9nvrlvym5r-gcc-7.3.0-lib/lib/libgomp.so.1 (0x00007f0a6d774000)
	libcusparse.so.10.0 => /nix/store/irgsnzls1sxiz55csna3k80rbvvjik5a-cudatoolkit-10.0.130-unsplit/lib/libcusparse.so.10.0 (0x00007f0a69d06000)
	libcurand.so.10.0 => /nix/store/irgsnzls1sxiz55csna3k80rbvvjik5a-cudatoolkit-10.0.130-unsplit/lib/libcurand.so.10.0 (0x00007f0a65b9d000)
	libnccl.so.1 => /nix/store/gcayaq662v8pfp4y05al7kv6gczb7fy7-python3.6-pytorch-1.0rc1-1.0rc1/lib/python3.6/site-packages/torch/lib/libnccl.so.1 (0x00007f0a64f9c000)
	libcufft.so.10.0 => /nix/store/irgsnzls1sxiz55csna3k80rbvvjik5a-cudatoolkit-10.0.130-unsplit/lib/libcufft.so.10.0 (0x00007f0a5eae6000)
	libcublas.so.10.0 => /nix/store/irgsnzls1sxiz55csna3k80rbvvjik5a-cudatoolkit-10.0.130-unsplit/lib/libcublas.so.10.0 (0x00007f0a5a54d000)
	libgfortran.so.4 => /nix/store/vlh8gaksfk6b6hjmqc9fx0l8qxplxd60-gfortran-7.3.0-lib/lib/libgfortran.so.4 (0x00007f0a5a179000)
	libnvidia-fatbinaryloader.so.410.73 => /run/opengl-driver/lib/libnvidia-fatbinaryloader.so.410.73 (0x00007f0a59f2b000)
	libquadmath.so.0 => /nix/store/vlh8gaksfk6b6hjmqc9fx0l8qxplxd60-gfortran-7.3.0-lib/lib/libquadmath.so.0 (0x00007f0a59ceb000)

I don’t see any references to /build. I see in the bug report that people got past this somehow, but I don’t see how to apply this in my case… I’d love help on how to proceed.


#2

Try:

$ strings /nix/store/gcayaq662v8pfp4y05al7kv6gczb7fy7-python3.6-pytorch-1.0rc1-1.0rc1/lib/python3.6/site-packages/torch/lib/libtorch.so | grep "/build"

The strings command is part of binutils.


#3

Thanks! That did the trick! I copied the RPATH fixup clause that was used on libcaffe.so to do it to libtorch.so and libtorch.so.1 and it compiled. Adding thm_distributed to the disabled tests got it through the tests, and it seems to work! Thanks!!!

Below is my prefixup with my changes:

  preFixup = ''
    function join_by { local IFS="$1"; shift; echo "$*"; }
    function strip2 {
      IFS=':'
      read -ra RP <<< $(patchelf --print-rpath $1)
      IFS=' '
      RP_NEW=$(join_by : ''${RP[@]:2})
      patchelf --set-rpath \$ORIGIN:''${RP_NEW} "$1"
    }

    for f in $(find ''${out} -name 'libcaffe2*.so')
    do
      strip2 $f
    done
    for f in $(find ''${out} -name 'libtorch.so.1')
    do
      strip2 $f
    done
    for f in $(find ''${out} -name 'libtorch.so')
    do
      strip2 $f
    done
  '';


#4

BTW, I did this by copying the pytorch.nix and calling it. Is there an easier way to change just the preFixup phase? It seems like there should be a way to do that with a string append? Also, should I make this a separate topic and lose the context, or keep the question here?


#5

You can use the overrideAttrs function to do that:

somepackage.overrideAttrs(old: {
  preFixup = (old.preFixup or "") + ''
    # your ammendments 
  ''
})