Tensorflow-gpu, Keras in a nix-shell

ogkloo · August 22, 2020, 4:33am

I’ve been trying to setup Tensorflow with GPU support for a while now and I can’t get it to work no matter which packages I put in. I can get very close – But can’t quite get past the same error. The error is always some form of missing libcudart.so.10.1.

My nix-shell currently looks like this:

with import <nixpkgs> {};
mkShell {
  name = "tensorflow-cuda-shell";

  buildInputs = with python3.pkgs; [
    pip
    numpy
    setuptools
    cudatoolkit_10_1
    cudnn_cudatoolkit_10_1
  ];

  shellHook = ''
    export LD_LIBRARY_PATH=${pkgs.stdenv.cc.cc.lib}/lib:${pkgs.cudatoolkit_10_1}/lib:${pkgs.cudatoolkit_10_1}/lib64::${pkgs.cudnn_cudatoolkit_10_1}/lib:${pkgs.cudatoolkit_10_1.lib}:/run/opengl-driver/lib:/run/opengl-driver-32/lib:/lib:$LD_LIBRARY_PATH
    alias pip="PIP_PREFIX='$(pwd)/_build/pip_packages' TMPDIR='$HOME' \pip"
    export CUDA_PATH="${pkgs.cudatoolkit_10_1}"
    export PYTHONPATH="$(pwd)/_build/pip_packages/lib/python3.7/site-packages:$PYTHONPATH"
    export PATH="$(pwd)/_build/pip_packages/bin:$PATH"
    unset SOURCE_DATE_EPOCH
  '';
}

And I’ve installed tensorflow-gpu v2.2.0 and keras through pip. When I get a python REPL, I can import keras or tensorflow just fine. They even work fine! But they don’t use the GPU, just the CPU. I can get an error if I say tf.test.gpu_device_name(). The full text is:

2020-08-21 21:26:36.810306: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-08-21 21:26:36.827011: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 3792885000 Hz
2020-08-21 21:26:36.827542: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fb2c8000b60 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-08-21 21:26:36.827556: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-08-21 21:26:36.829110: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-08-21 21:26:37.073646: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-08-21 21:26:37.074058: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x4109970 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-08-21 21:26:37.074077: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce RTX 2080 SUPER, Compute Capability 7.5
2020-08-21 21:26:37.074207: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-08-21 21:26:37.074563: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:0a:00.0 name: GeForce RTX 2080 SUPER computeCapability: 7.5
coreClock: 1.815GHz coreCount: 48 deviceMemorySize: 7.79GiB deviceMemoryBandwidth: 462.00GiB/s
2020-08-21 21:26:37.074641: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /nix/store/danv012gh0aakh8xnk2b35vahklz72mk-gcc-9.2.0-lib/lib:/nix/store/mvapq9iwhf6i2n30d8wz5cq7g8k2p1kq-cudatoolkit-10.1.243/lib:/nix/store/mvapq9iwhf6i2n30d8wz5cq7g8k2p1kq-cudatoolkit-10.1.243/lib64::/nix/store/rb1inkp9wdjavk30xc5g7h186hx1v586-cudatoolkit-10.1-cudnn-7.6.3/lib:/nix/store/qaqydzv4jllmijqcnyh6amdwj3ycv7yh-cudatoolkit-10.1.243-lib:/run/opengl-driver/lib:/run/opengl-driver-32/lib:/lib:
2020-08-21 21:26:37.075956: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-08-21 21:26:37.077163: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-08-21 21:26:37.077328: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-08-21 21:26:37.078486: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-08-21 21:26:37.079088: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-08-21 21:26:37.081492: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-08-21 21:26:37.081506: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1598] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2020-08-21 21:26:37.081526: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-08-21 21:26:37.081534: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108]      0 
2020-08-21 21:26:37.081541: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0:   N

So it seems like the problem seems to be: Could not load dynamic library 'libcudart.so.10.1';. Searches for that dug up CUDA version problems, but if I check the version with nvcc --version I get 10.1. So I’m not really sure where to go from here.

brogos · August 23, 2020, 12:17am

Hi, it worked for me after I added ${pkgs.cudatoolkit_10_1.lib}/lib in $LD_LIBRARY_PATH

ogkloo · August 23, 2020, 12:20am

Thank you! That worked perfectly. Do you mind if I add a note using this to the wiki page?

Edit: Well, okay, it solved some issues, but now I just get failed call to cuInit. Huh. This seems to happen a lot on Ubuntu, too, though.

brogos · August 23, 2020, 12:24am

No problem, @ogkloo.