CUDA Tensorflow, my setup is really hacky, would appreciate help unhackying it

JohnAtl · April 22, 2024, 6:16pm

I am in the process of switching to NixOS from EndeavourOS.
I’m impressed that I was able to get NixOS running given my GPU setup (Intel Arc A380 with monitors, Nvidia RTX A4500 no monitors).

When attempting to use Tensorflow, I was having errors about missing libs, and wound up having to define LD_LIBRARY_PATH like this:

export LD_LIBRARY_PATH="/nix/store/a30cchbinr8g8ppv8wxkjgp48zdp4040-nvidia-x11-545.29.02-6.8.7/lib;/nix/store/959yp9jjqm3b6nyblq8k4cvzzz6jbwl7-cudatoolkit-11.8.0-lib/lib;/nix/store/0m0kl19qmwvzq36ann76fbds6ccrjayc-cudnn-8.9.7.29/lib;/nix/store/myw67gkgayf3s2mniij7zwd79lxy8v0k-gcc-12.3.0-lib/lib"

to get Tensorflow to work. I’m (98%) sure this isn’t the proper way to do this.

Maybe I’m holding it wrong?

I would appreciate any suggestions!

configuration.nix and neofetch

SergeK · April 22, 2024, 8:33pm

nix-shell --arg config '{ cudaSupport = true; allowUnfree = true; }' -p 'python3.withPackages (ps: [ ps.tensorflow ])'

JohnAtl · April 23, 2024, 11:26am

Thanks!
That’s kind of magical.
I’ll have to figure out what it’s doing.

eljamm · April 23, 2024, 12:15pm

Hope this helps too: Tensorflow - NixOS Wiki

JohnAtl · April 23, 2024, 7:39pm

Hm.
Whether I install as Nix-native packages, or using @SergeK 's shell invocation, I still need to have LD_LIBRARY_PATH defined for it to work.
I didn’t realize that when I tried Serge’s shell yesterday.

Edit: it looks like the shell definition in the wiki still requires setting LD_LIBRARY_PATH to work.

Edit 2: According to @NobbZ , this is not the way to do it.

SergeK · April 23, 2024, 8:42pm

The nix-shell shouldn’t require any LD_LIBRARY_PATH. Could you elaborate what errors you get without it?

JohnAtl · April 23, 2024, 9:25pm

I can create run the shell and check the tensorflow version:
python -c "import tensorflow as tf; print(tf.__version__)"
and it reports 2.13.0.
I create a venv and pip install tensorflow-datasets.
When i run the above to check the TF version, I get this error:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'tensorflow'

So I pip install 'tensorflow[and-cuda]==2.15.1' and when I run the above to check the TF version, I get this error:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/john/.venv/lib/python3.11/site-packages/tensorflow/__init__.py", line 45, in <module>
    from tensorflow.python import tf2 as _tf2
  File "/home/john/.venv/lib/python3.11/site-packages/tensorflow/python/tf2.py", line 21, in <module>
    from tensorflow.python.platform import _pywrap_tf2
ImportError: libstdc++.so.6: cannot open shared object file: No such file or directory

If I set LD_LIBRARY_PATH, the TF version reports as 2.15.1.
I can then run a script I have that trains a model on the MNIST dataset.

If I then unset LD_LIBRARY_PATH, and check the TF version, I receive the error above about libstdc++.so.6 being missing.

Edit: Updated configuration.nix where I commented out some duplicate declarations.

SergeK · April 24, 2024, 8:44pm

Aj, prebuilt packages from PyPi and the impure “venvs” are a separate issue. PyPi distributes software built for FHS platforms, it does require host configuration (nix-ld, nixglhost, etc in case of NixOS, or “apt-get install”-ing “system dependencies” in mainstream distributions).

When creating a venv also make sure to pass the flag telling it to “use system dependencies” (may not work in older nixpkgs versions, cf Fix venv creation in Python environments by cwp · Pull Request #297628 · NixOS/nixpkgs · GitHub)

ruro · April 25, 2024, 2:31pm

The PR you linked got reverted almost immediately, so currently it doesn’t work in almost any nixpkgs version. Although I am hopeful that it’ll get fixed eventually since the main idea of the fix was fine, the problem was mostly in the implementation details.

JohnAtl · April 25, 2024, 2:45pm

Thanks for trying to help everyone!
I fixed this by going back to EndeavourOS.
I was spending too much time working on the tools, rather than working on the work.
Thanks again!