Pytorch with cuda support

eingird · August 26, 2024, 10:37am

Hi everyone,

I’m new to NixOS and recently set up a PC to develop ML models using PyTorch with CUDA support. I’ve been experimenting with a few nix-shell configurations, but I keep running into issues with long build times, especially with compiling CUDA and other dependencies.

Here are the configurations I’ve tried:

Configuration 1:

let
  # Pin to a specific nixpkgs commit for reproducibility.
  pkgs = import (fetchTarball "https://github.com/NixOS/nixpkgs/archive/24bb1b20a9a57175965c0a9fb9533e00e370c88b.tar.gz") {config.allowUnfree = true;};
in pkgs.mkShell {
  buildInputs = [
    pkgs.python311
    pkgs.python311Packages.torch-bin
    pkgs.python311Packages.unidecode
    pkgs.python311Packages.inflect
    pkgs.python311Packages.librosa
    pkgs.python311Packages.pip
  ];

  shellHook = ''
    echo "You are now using a NIX environment"
    export CUDA_PATH=${pkgs.cudatoolkit}
  '';
}

Configuration 2:

nix-shell --arg config '{ allowUnfree = true; }' -p 'python311.withPackages (ps: with ps; [torchWithCuda])' -I nixpkgs=https://github.com/NixOS/nixpkgs/archive/26c6b9f5394b2e64649ff18394d61ce5594874e2.tar.gz’

Configuration 3:


{ pkgs ? import <nixpkgs> {
  config = {
    allowUnfree = true;
    cudaSupport = true;
  };
} }:
  pkgs.mkShell {
    # nativeBuildInputs is usually what you want -- tools you need to run
    nativeBuildInputs = with pkgs.buildPackages; [
      python311
      cudaPackages_12.cudatoolkit
      python311Packages.pytorch-bin
      python311Packages.pip
    ];

    shellHook = ''
      echo "You are now using a NIX environment"
      export CUDA_PATH=${pkgs.cudatoolkit}
    '';
}

Issue: In all cases, I end up with long build times because of the compilation of CUDA, PyTorch, and other dependencies. The builds can take over an hour, which is quite frustrating.

Question: Is there a way to set up a development environment for PyTorch with CUDA on NixOS without having to compile everything from scratch? Are there any pre-built binaries or optimized configurations that I can use to speed up this process? ( I have tried torch-bin, as suggested here, but it still was compiling everything in the same way )

I appreciate any advice or tips you can provide!

Thanks in advance!

eingird · August 26, 2024, 10:57am

Some additional info :

nix --version
nix (Nix) 2.18.5

systemStateVersion : 24.05

nvidia-smi output:

±----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 1345 G …mzcsvfvcv-xorg-server-21.1.13/bin/X 162MiB |
| 0 N/A N/A 1716 G …irefox-128.0.3/bin/.firefox-wrapped 136MiB |
| 0 N/A N/A 2068 G …erProcess --variations-seed-version 86MiB |
| 0 N/A N/A 16208 G …/profiles/per-user/nazara/bin/kitty 16MiB |
±----------------------------------------------------------------------------------------+

Nvidia driver:


  services.xserver.videoDrivers = ["nvidia"];

  hardware.nvidia = {


    modesetting.enable = true;


    powerManagement.enable = false;


    powerManagement.finegrained = false;

   
    open = false;


    nvidiaSettings = true;


    package = config.boot.kernelPackages.nvidiaPackages.stable;
  };

eingird · August 27, 2024, 6:43pm

I found a solution for this issue. Here’s how I addressed it:

Cachix Setup:

I used Cachix not to compile/build CUDA.
Also, I used package pytorch-bin in my nix-shell file.

Nix-Shell Configuration:

I did not use cudaSupport = true; in the nix-shell file. Including this option would compile CUDA, which is unnecessary in this case.
The same issue occurred with the torchWithCuda package (or similar).

Cachix Installation:

When installing, Cachix will generate a new caching.nix file and a caching directory in /etc/nixos. Make sure to move these to your dotfiles.
After installing Cachix, you’ll need to import it into your configuration.nix file and do nixos-rebuild.

After all this steps I successfully was able to run nix-shell shell.nix and had cuda available in pytorch

Hope this helps someone, I might also post here my shell.nix later on.

Useful links:
https://nixos.wiki/wiki/CUDA

https://app.cachix.org/cache/cuda-maintainers

eingird · September 2, 2024, 8:11am

Here is the nix-shell that I used


let
  # Pin to a specific nixpkgs commit for reproducibility.
  pkgs = import (fetchTarball "https://github.com/NixOS/nixpkgs/archive/24bb1b20a9a57175965c0a9fb9533e00e370c88b.tar.gz") {config.allowUnfree = true; };
in pkgs.mkShell {
  nativeBuildInputs = [
    pkgs.python311
    pkgs.python311Packages.torch-bin
    pkgs.python311Packages.torchaudio-bin
    pkgs.python311Packages.torch-audiomentations
    pkgs.python311Packages.librosa
    pkgs.python311Packages.jiwer
    pkgs.python311Packages.datasets
    pkgs.python311Packages.transformers
    pkgs.python311Packages.evaluate
    pkgs.python311Packages.accelerate
    pkgs.python311Packages.pip
    
  ];

  shellHook = ''
    echo "You are now using a NIX environment"
    export CUDA_PATH=${pkgs.cudatoolkit}
    echo $CUDA_PATH
  '';
}

WaterIris · November 21, 2024, 3:48pm

Hi my silly method to solve it with flake’s is :

{
  description = "Pytorch with cuda";

  inputs = {
    nixpkgs.url = "github:nixos/nixpkgs?ref=nixos-24.05";
  };
  outputs = { self, nixpkgs }:
  
  let 
   pkgs = import nixpkgs { system = "x86_64-linux"; config.allowUnfree = true; };
  in
  { 
    devShells."x86_64-linux".default = pkgs.mkShell {
      LD_LIBRARY_PATH = pkgs.lib.makeLibraryPath [
        pkgs.stdenv.cc.cc
        "/run/opengl-driver"
      ];
        
      venvDir = ".venv";
      packages = with pkgs; [
        python312    
        python312Packages.venvShellHook
        python312Packages.pip
        python312Packages.numpy
        python312Packages.pyqt6
      ];
        
    };
  };
}

opengl in enabled via configuration.nix, remaining required packages can be installed via pip, i recommend using direnv too if somebody can point out negatives of this usage please let me know.

SergeK · November 21, 2024, 4:39pm

I’ll just link this here: CUDA Cache for Nix Community.

All the mentioned hacks like LD_LIBRARY_PATH (nix-ld is the safer way to implement that), FHS namespaces, and temporary “patchelfed” shims such as torch-bin are all viable ways to make PyPi software work. Be advised, however, that this is fundamentally different from and less reliable than actually using the “real” nixpkgs, which supports overlays and overrides and attempts to make packages complete and correct by construction.

It’s also maybe worth mentioning that the page was written long time ago and might (well, it does) contain outdated information.

Fesesnisco · December 16, 2024, 11:07pm

Hi, I really like your setup because I prefer managing packages via a venv, but I can’t set the device to GPU inside pytorch. Do you configure something other than opengl in configuration.nix? Is there something I could be missing? I’m using this exact flake.

WaterIris · December 17, 2024, 6:33pm

a right, here’s part responsible for it in config, tbh i don’t remember source of it.

{ config, lib, pkgs, ... }:
{

  # Enable OpenGL
  hardware.opengl = {
    enable = true;
  };

  # Load nvidia driver for Xorg and Wayland
  # services.xserver.videoDrivers = ["nvidia"];

  hardware.nvidia = {

    # Modesetting is required.
    modesetting.enable = false;

    # Nvidia power management. Experimental, and can cause sleep/suspend to fail.
    # Enable this if you have graphical corruption issues or application crashes after waking
    # up from sleep. This fixes it by saving the entire VRAM memory to /tmp/ instead 
    # of just the bare essentials.
    powerManagement.enable = false;

    # Fine-grained power management. Turns off GPU when not in use.
    # Experimental and only works on modern Nvidia GPUs (Turing or newer).
    powerManagement.finegrained = false;

    # Use the NVidia open source kernel module (not to be confused with the
    # independent third-party "nouveau" open source driver).
    # Support is limited to the Turing and later architectures. Full list of 
    # supported GPUs is at: 
    # https://github.com/NVIDIA/open-gpu-kernel-modules#compatible-gpus 
    # Only available from driver 515.43.04+
    # Currently alpha-quality/buggy, so false is currently the recommended setting.
    open = false;

    # Enable the Nvidia settings menu,
    # accessible via `nvidia-settings`.
    nvidiaSettings = true;

    # Optionally, you may need to select the appropriate driver version for your specific GPU.
    package = config.boot.kernelPackages.nvidiaPackages.stable;
  };
}

that should be everything, u may need to tweak few options depending on gpu but well, that worked for me

Fesesnisco · December 17, 2024, 10:51pm

Yes, it was that. Thanks a lot, now it’s working perfectly. To be honest I should’ve read the docs for the GPU issue, so thank you very much for your help.