I’ve been trying to setup Tensorflow with GPU support for a while now and I can’t get it to work no matter which packages I put in. I can get very close – But can’t quite get past the same error. The error is always some form of missing libcudart.so.10.1
.
My nix-shell
currently looks like this:
with import <nixpkgs> {};
mkShell {
name = "tensorflow-cuda-shell";
buildInputs = with python3.pkgs; [
pip
numpy
setuptools
cudatoolkit_10_1
cudnn_cudatoolkit_10_1
];
shellHook = ''
export LD_LIBRARY_PATH=${pkgs.stdenv.cc.cc.lib}/lib:${pkgs.cudatoolkit_10_1}/lib:${pkgs.cudatoolkit_10_1}/lib64::${pkgs.cudnn_cudatoolkit_10_1}/lib:${pkgs.cudatoolkit_10_1.lib}:/run/opengl-driver/lib:/run/opengl-driver-32/lib:/lib:$LD_LIBRARY_PATH
alias pip="PIP_PREFIX='$(pwd)/_build/pip_packages' TMPDIR='$HOME' \pip"
export CUDA_PATH="${pkgs.cudatoolkit_10_1}"
export PYTHONPATH="$(pwd)/_build/pip_packages/lib/python3.7/site-packages:$PYTHONPATH"
export PATH="$(pwd)/_build/pip_packages/bin:$PATH"
unset SOURCE_DATE_EPOCH
'';
}
And I’ve installed tensorflow-gpu
v2.2.0 and keras
through pip
. When I get a python REPL, I can import keras or tensorflow just fine. They even work fine! But they don’t use the GPU, just the CPU. I can get an error if I say tf.test.gpu_device_name()
. The full text is:
2020-08-21 21:26:36.810306: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-08-21 21:26:36.827011: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 3792885000 Hz
2020-08-21 21:26:36.827542: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fb2c8000b60 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-08-21 21:26:36.827556: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-08-21 21:26:36.829110: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-08-21 21:26:37.073646: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-08-21 21:26:37.074058: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x4109970 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-08-21 21:26:37.074077: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce RTX 2080 SUPER, Compute Capability 7.5
2020-08-21 21:26:37.074207: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-08-21 21:26:37.074563: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:0a:00.0 name: GeForce RTX 2080 SUPER computeCapability: 7.5
coreClock: 1.815GHz coreCount: 48 deviceMemorySize: 7.79GiB deviceMemoryBandwidth: 462.00GiB/s
2020-08-21 21:26:37.074641: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /nix/store/danv012gh0aakh8xnk2b35vahklz72mk-gcc-9.2.0-lib/lib:/nix/store/mvapq9iwhf6i2n30d8wz5cq7g8k2p1kq-cudatoolkit-10.1.243/lib:/nix/store/mvapq9iwhf6i2n30d8wz5cq7g8k2p1kq-cudatoolkit-10.1.243/lib64::/nix/store/rb1inkp9wdjavk30xc5g7h186hx1v586-cudatoolkit-10.1-cudnn-7.6.3/lib:/nix/store/qaqydzv4jllmijqcnyh6amdwj3ycv7yh-cudatoolkit-10.1.243-lib:/run/opengl-driver/lib:/run/opengl-driver-32/lib:/lib:
2020-08-21 21:26:37.075956: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-08-21 21:26:37.077163: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-08-21 21:26:37.077328: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-08-21 21:26:37.078486: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-08-21 21:26:37.079088: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-08-21 21:26:37.081492: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-08-21 21:26:37.081506: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1598] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2020-08-21 21:26:37.081526: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-08-21 21:26:37.081534: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0
2020-08-21 21:26:37.081541: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N
So it seems like the problem seems to be: Could not load dynamic library 'libcudart.so.10.1';
. Searches for that dug up CUDA version problems, but if I check the version with nvcc --version
I get 10.1. So I’m not really sure where to go from here.