Pytorch. Nix not working. NixOS works

mmport80 · May 6, 2024, 4:05am

I can get Pytorch working for NixOS - but the CUDA drivers aren’t visible with Nix running on Debian 11.

Is this to be expected? Or should I try something else, e.g. use pytorch package (and compile…) instead of my current pytorch-bin package?

Nixos (working)

{ pkgs, config, ... }:
{
  imports = [
    <nixpkgs/nixos/modules/virtualisation/google-compute-image.nix>
  ];
  environment.systemPackages = with pkgs; [
                             tmux emacs-nox htop git packer nixos-generators wget mosh

			     nvtop
                             (pkgs.python3.withPackages (ps: with ps; [
                                                        numpy pytorch-bin
                                                        ]))
                             ];

  nixpkgs.config = {
                 allowUnfree = true;
                 cudaSupport = true;
                 };

  services.xserver.videoDrivers = [ "nvidia" ];
  hardware.opengl.enable = true;

}

Nix (not working):

{
  description = "A simple Python developer shell";

  inputs = {
    nixpkgs.url = "github:NixOS/nixpkgs/nixpkgs-unstable";
  };

  outputs = { self, nixpkgs }:
    let
      system = "x86_64-linux";
      gpu_driver_libs = "/usr/lib/x86_64-linux-gnu";
      pkgs = import nixpkgs {
        inherit system;
        config = {
          allowUnfree = true;
          cudaSupport = true;
          };
        overlays = [];
      };
    in {
      devShells.${system}.default = pkgs.mkShell {
        
        buildInputs = with pkgs; [
          # Include the Python interpreter
          (python3.withPackages (ps: [ ps.numpy ps.pytorch-bin ps.torchvision-bin ps.opencv4 ps.plotext ]))
          nvtopPackages.full
          cudatoolkit
          linuxPackages.nvidia_x11
        ];

        ...
      };
    };
}

vuthanhtung2412 · October 7, 2024, 8:44pm

Did you get any updates regarding this problem ?

mmport80 · October 10, 2024, 5:25am

IIRC, i ended up concluding that Nix just doesn’t have the hooks for the Nvidia drivers.

voidyourwarranty · October 23, 2024, 10:48pm

Hi,
I have pretty much the same question. I am on Ubuntu 24.04 LTS with CUDA 12.6 installed in the conventional way. If I install the standard pytorch with pip in a python virtual environment, it recognizes the CUDA driver that is present on the Ubuntu system.

Now I try to create a development shell (or flake) for my python environment, and simply do what others report does work on NixOS, say as a flake:

{
  description = "?";

  inputs = {
    nixpkgs.url = "github:NixOS/nixpkgs/nixos-24.05";
  };

  outputs = { self, nixpkgs,... }@inputs:
    let
      system = "x86_64-linux";
      pkgs = import nixpkgs {
        inherit system;
	    config = {
	      allowUnfree = true;
	    };
      };
    in
    {
      devShells.${system}.default =
        pkgs.mkShell {
	    nativeBuildInputs = with pkgs.buildPackages; [
	      python3
	      cudaPackages_12.cudatoolkit
	      python3Packages.pytorch-bin
 	    ];

            shellHook = ''
	      export CUDA_PATH=${pkgs.cudatoolkit}
	      echo "nix develop active."
            '';
        };
    };
}

But runnig python and then import torch and torch.cuda.is_available () returns False.

For some reason, the torch that is installed either does not contain CUDA support, or it is not able to detect the CUDA drivers in their standard Ubuntu location.

Any ideas?

I am not ready to switch to NixOS entirely and was therefore trying to get the development shells running on a fairly standard Ubuntu.

SergeK · October 24, 2024, 8:23am

You don’t need cudatoolkit in nativeBuildInputs, cudatoolkit does not contain the driver (libcuda). You need to run python using GitHub - numtide/nix-gl-host: Run OpenGL/Cuda programs built with Nix, on all Linux distributions. or an equivalent.

voidyourwarranty · October 24, 2024, 9:12pm

Thanks - I am rather new to nix. Does the following make sense? It seems that when I run the python or the torch that are provided by the package manager nix, then they can see only those parts of the system (paths, libraries) that I have specified as inputs. On a full NixOS system this makes perfect sense, and it forces everyone to be honest about all dependencies and in turn allows to verify the hashes and provide integrity.

But as an isolated package manager that merely runs on a vanilla Ubuntu host, I will never be able to prove completeness or guarantee security anyway because nix (the package manager) simply does not control that host system. In my case I need to show the torch some parts of the Ubuntu that I have not declared and that nix has not even provided: the CUDA drivers.

So the Python code that you linked to, fudges this and injects the missing library paths without nix noticing that someone is “cheating”.

If this makes sense, I will try to understand what is the easiest way of injecting a specific library path into a nix shell - perhaps I can do even without that Python code?

StepBroBD · October 24, 2024, 9:15pm

this might work

gist.github.com

https://gist.github.com/stepbrobd/4141c411f5413e55f807702fcfe2d051

cuda.nix

{
  inputs = {
    nixpkgs.url = "github:nixos/nixpkgs/nixos-unstable";
    parts.url = "github:hercules-ci/flake-parts";
    nixgl.url = "github:nix-community/nixgl";
  };

  outputs = inputs @ { self, nixpkgs, parts, nixgl }: parts.lib.mkFlake { inherit inputs; } {
    systems = [ "aarch64-darwin" "x86_64-darwin" "x86_64-linux" ];

This file has been truncated. show original

voidyourwarranty · October 26, 2024, 1:53pm

Hi, thanks for your help. I am trying to figure out what’s going on here. I am now using this flake:

{
  description = "?";

  inputs = {
    nixpkgs.url = "github:NixOS/nixpkgs/nixos-24.05";
  };

  outputs = { self, nixpkgs,... }@inputs:
    let
      system = "x86_64-linux";
      pkgs = import nixpkgs {
	inherit system;
            config = {
              allowUnfree = true;
            };
      };
    in
    {
      devShells.${system}.default =
	pkgs.mkShell {
            nativeBuildInputs = with pkgs.buildPackages; [
              python3
              python3Packages.pytorch-bin
            ];

            shellHook = ''
              echo "nix develop active."
            '';
        };    
    };
}

I also studied the Ubuntu *.deb package of the original nVidia drivers. Assuming that PyTorch only needs the drivers (but not the CUDA toolkit nor the nVidia cudnn library), there are a number of shared libraries, all in '/usr/lib/x86_64-linux-gnu/" that Python ought to be able to find. Now calling in my nix develop shell

LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/x86_64-linux-gnu/ python

gives a seg fault, probably because it dynamically loads an unrelated shared lib that conflicts with the nix provided python. So I need to be more specific and only offer the Ubuntu version of the nvidia libraries. So

mkdir lib
cd lib
ln -s /usr/lib/x86_64-linux-gnu/*nvidia* .
ln -s /usr/lib/x86_64-linux-gnu/libnv* .
ln -s /usr/lib/x86_64-linux-gnu/libcuda* .
cd ..
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:./lib/ python

does start python, can import torch and confirms existence of CUDA!! It seems I am in business this way.

The above soft links are a terrible fudge though. Now I wonder what is the most elegant way of showing to nix a precise set of system libraries whose global paths I can specify?

voidyourwarranty · October 27, 2024, 6:20pm

Perhaps someone is interested in the solution in more detail. I am running the nix package manager on an Ubuntu 24.04 LTS system that has CUDA drivers installed (560.35.03) and want to make these local CUDA drivers visible to software installed by nix in order to build development shells. This is done in three steps as follows.

expose-cuda.nix is a derivation that packages symbolic links with global paths to the relevant dynamic link libraries into nix. This depends on the details of the CUDA drivers that are already installed:

{ stdenv }:

stdenv.mkDerivation rec {

  name    = "expose-cuda-${version}";
  version = "1.0";
  src     = ./.;

  installPhase = ''
    mkdir -p $out/lib
    ln -s /usr/lib/x86_64-linux-gnu/libcuda.so $out/lib
    ln -s /usr/lib/x86_64-linux-gnu/libcuda.so.1 $out/lib
    ln -s /usr/lib/x86_64-linux-gnu/libcudadebugger.so.1 $out/lib
    ln -s /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.0 $out/lib
    ln -s /usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.1 $out/lib
    ln -s /usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.2 $out/lib
    ln -s /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.0 $out/lib
    ln -s /usr/lib/x86_64-linux-gnu/libnvcuvid.so $out/lib
    ln -s /usr/lib/x86_64-linux-gnu/libnvcuvid.so.1 $out/lib
    ln -s /usr/lib/x86_64-linux-gnu/libnvidia-allocator.so $out/lib
    ln -s /usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.1 $out/lib
    ln -s /usr/lib/x86_64-linux-gnu/libnvidia-api.so.1 $out/lib
    ln -s /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so $out/lib
    ln -s /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.1 $out/lib
    ln -s /usr/lib/x86_64-linux-gnu/libnvidia-egl-gbm.so.1 $out/lib
    ln -s /usr/lib/x86_64-linux-gnu/libnvidia-egl-gbm.so.1.1.1 $out/lib
    ln -s /usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.560.35.03 $out/lib
    ln -s /usr/lib/x86_64-linux-gnu/libnvidia-encode.so $out/lib
    ln -s /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.1 $out/lib
    ln -s /usr/lib/x86_64-linux-gnu/libnvidia-fbc.so $out/lib
    ln -s /usr/lib/x86_64-linux-gnu/libnvidia-fbc.so.1 $out/lib
    ln -s /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.560.35.03 $out/lib
    ln -s /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.560.35.03 $out/lib
    ln -s /usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.560.35.03 $out/lib
    ln -s /usr/lib/x86_64-linux-gnu/libnvidia-gpucomp.so.1 $out/lib
    ln -s /usr/lib/x86_64-linux-gnu/libnvidia-ml.so $out/lib
    ln -s /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 $out/lib
    ln -s /usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.1
    ln -s /usr/lib/x86_64-linux-gnu/libnvidia-nvvm.so $out/lib
    ln -s /usr/lib/x86_64-linux-gnu/libnvidia-nvvm.so.4 $out/lib
    ln -s /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.1 $out/lib
    ln -s /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so $out/lib
    ln -s /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.1 $out/lib
    ln -s /usr/lib/x86_64-linux-gnu/libnvidia-pkcs11-openssl3.so.560.35.03 $out/lib
    ln -s /usr/lib/x86_64-linux-gnu/libnvidia-pkcs11.so.560.35.03 $out/lib
    ln -s /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so $out/lib
    ln -s /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.1 $out/lib
    ln -s /usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.560.35.03 $out/lib
    ln -s /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.560.35.03 $out/lib
    ln -s /usr/lib/x86_64-linux-gnu/libnvoptix.so.1 $out/lib
  '';
}

Second, in cuda-python.nix I wrap Python3 in such a way that it is presented these libraries at the end of the LD_LIBRARY_PATH. Apparently this is where Python3 expects them and where torch can later probe them:

{ stdenv, pkgs, expose-cuda }:

stdenv.mkDerivation rec {

  name    = "cuda-python-${version}";
  version = "1.0";
  src     = ./.;

  nativeBuildInputs = with pkgs; [
    makeWrapper
  ];

  buildInputs = with pkgs; [
    python3
    expose-cuda
	];

  buildPhase = ''
  '';

  installPhase = ''
    mkdir -p $out/bin
    mkdir -p $out/lib
    cp -p ${pkgs.python3}/bin/python $out/bin
  '';

  postFixup = ''
    wrapProgram $out/bin/python --suffix LD_LIBRARY_PATH ':' ${expose-cuda}/lib
  '';
}

Finally, the dev shell that is based on this wrapped Python3, here an actual example comfyui-env.nix:

{ pkgs, cuda-python }:

pkgs.mkShell rec {
  nativeBuildInputs = with pkgs.buildPackages; [
    cuda-python
    python3Packages.torchsde
    python3Packages.torch
    python3Packages.torchvision
    python3Packages.torchaudio
    python3Packages.einops
    python3Packages.transformers
    python3Packages.tokenizers
    python3Packages.sentencepiece
    python3Packages.safetensors
    python3Packages.aiohttp
    python3Packages.pyyaml
    python3Packages.pillow
    python3Packages.scipy
    python3Packages.tqdm
    python3Packages.psutil
    python3Packages.soundfile
  ];

  shellHook = ''
    echo -e "PyTorch/CUDA environment active.\n"
  '';
}

A working default.nix then reads:

let
  pkgs = import <nixpkgs> {
    config = {
      allowUnfree = true;
      cudaSupport = true;
    };
  };
  expose-cuda = pkgs.callPackage ./expose-cuda.nix {};
  cuda-python = pkgs.callPackage ./cuda-python.nix { inherit expose-cuda; };
  comfyui-env = import ./comfyui-env.nix;
in
  comfyui-env { pkgs = pkgs; cuda-python = cuda-python; }

Note that with these config flags, PyTorch is provided with CUDA support, and in my case it is built from source. So run it about 2 hours before you actually need it…

As for the nix stuff, I am rather new to the game, so if you have suggestions as to how I can improve it, please explain.

Balssh · October 27, 2024, 9:44pm

Other than maybe using caching instead of building this everytime (I’m also not aware how as I’m myself new to nix) can’t you just do:
ln -s /usr/lib/x86_64-linux-gnu/* $out/lib to cut the number of lines of linking?

voidyourwarranty · October 27, 2024, 9:50pm

No, unfortunately not. That directory holds a huge number of libraries, and if the wrong ones appear in LD_LIBRARY_PATH, they may take precedence over their nix counterparts and cause all sorts of crashes.

What makes things difficult is that python3 does not use the Linux dynamic linker as its dynamic linker, but rather the Python interpreter (virtual machine). Only later, there are some Python modules (torch) that search for further dynamic link libraries. If it wasn’t python, I could add the libraries to the NIX_LDFLAGS, but that seems to affect only the first linker call which is python in this case.

Balssh · October 27, 2024, 9:53pm

Fair enough, thanks for the explanation!