CUDA setup on NixOS

I’m using ambrop72’s fix on 19.03pre154487.0a7e258012b (Koi).

hardware.nvidia.modesetting.enable = true;                                    
hardware.nvidia.optimus_prime = {                                             
  enable = true;                                                              
  intelBusId = "PCI:0:2:0";                                                   
  nvidiaBusId = "PCI:9:0:0";                                                  
};

with

nixpkgs.config.allowUnfree = true;  

(...)

services.xserver.videoDrivers = [ "nvidia" ];

nvidia-smi seems to work:

$ nvidia-smi
Wed Oct 10 09:49:31 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.87                 Driver Version: 390.87                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GT 740M     Off  | 00000000:09:00.0 N/A |                  N/A |
| N/A   47C    P5    N/A /  N/A |     80MiB /  2004MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0                    Not Supported                                       |
+-----------------------------------------------------------------------------+

What I’m trying to do is set up CUDA & nvcc, but I’m finding some issues due to lack of info, like:

  • It seems that cudatoolkit isn’t listed as package in nixos packages? But works fine if added in environment.systemPackages.
  • The wiki mentions that an additional rule needs to created:

Vulkan, CUDA and OpenCL work, though CUDA needs an additional device creation rule https://github.com/NixOS/nixpkgs/blob/05e375d7103ac51e2da917965c37246c99f1ae4f/nixos/modules/hardware/video/nvidia.nix#L72

Should I add this on my configuration.nix? nvidia.nix already seems to do that.

  # CUDA
  systemd.services.nvidia-control-devices = {
    wantedBy = [
      "multi-user.target"
    ];
    serviceConfig.ExecStart = "${pkgs.linuxPackages.nvidia_x11}/bin/nvidia-smi";
  };

However, while trying to build the examples:

$ cd cuda-samples
$ nix-build default.nix -A examplecuda

error: Package ‘cudatoolkit-9.1.85.1’ in /nix/store/wly35vrz5p3vhbjpf2xsr4zgqqyhsjvm-nixos-19.03pre154487.0a7e258012b/nixos/pkgs/development/compilers/cudatoolkit/default.nix:145 has an unfree license (‘unfree’), refusing to evaluate.

a) For `nixos-rebuild` you can set
  { nixpkgs.config.allowUnfree = true; }
in configuration.nix to override this.

b) For `nix-env`, `nix-build`, `nix-shell` or any other Nix command you can add
  { allowUnfree = true; }
to ~/.config/nixpkgs/config.nix.

Edit: nvidia-control-devices seems to fail to start.

sudo systemctl status nvidia-control-devices.service 
● nvidia-control-devices.service
   Loaded: loaded (/nix/store/7wc1p30n6ysi44jz9dcsi06f9hzwprzw-unit-nvidia-control-devices.service/nvidia-control-devices.service; enabled; vendor>
   Active: failed (Result: exit-code) since Wed 2018-10-10 10:11:59 -03; 1s ago
  Process: 2362 ExecStart=/nix/store/wkbhrvlnngd1r3mgc7075qcl3ql3splb-nvidia-x11-390.87-4.14.74/bin/nvidia-smi (code=exited, status=203/EXEC)
 Main PID: 2362 (code=exited, status=203/EXEC)
1 Like

Nevermind, I’m really stupid. It seems nix-env, nix-shell and nix-build don’t use the nixpkgs config from /etc/nixos/configuration.nix.

Just had to create a .config/nixpkgs/config.nix for my user and add:


{
    allowUnfree = true;
}

To run Grahan’s examples.

Quoting Marcos Benevides (2018-10-13 22:41:27)

Nevermind, I’m really stupid. It seems nix-env, nix-shell and nix-build don’t use the nixpkgs config from /etc/nixos/configuration.nix.

Yes. This is intentional, I guess, to make users be able to decide on
their own if they want unfree packages in their user profile.

I’m in a similar place - I’m trying to get a CUDA setup working under nixos, however, I did check to make sure that I already have allowUnfree = true; for my user.

I’ve got the nvidia driver version 390.87 working, nvidia-smi reports that my GTX 1080Ti is running xorg-server on GPU 0 (the only one I have)

I’ve installed cudatoolkit and cudnn as system packages, but running pytorch or tensorflow both report that no GPU is available. Furthermore, if I try to run examples from grahamc/nixos-cuda-example they fail saying "Using single CPU thread for multiple GPUs CUDA error at MonteCarloMultiGPU.cpp:300 code=30(cudaErrorUnknown) “cudaGetDeviceCount(&GPU_N)”.

Is there any guide to getting CUDA working under nixos?

1 Like

I had to do a few more things to get it running on an Optimus laptop:

# to keep the GPU activated
optirun sleep 10000000 &

sudo modprobe nvidia-uvm
sudo chown root:video /dev/nvidia*

I’m not saying it’s the proper way, just that it seemed necessary here.

1 Like

EDIT: Tested this recently in 19.03pre167858.f2a1a4e93be (Koi) and it works (just be warned that it takes a lot of time to get the first time).

Hey @ludflu, I’ve used this setup for using CUDA inside a Nix-Shell* (stolen from this PR) on my system, so no need to install it globally.

In a shell.nix:

{ pkgs ? import <nixpkgs> {} }:

pkgs.stdenv.mkDerivation {
  name = "cuda-env-shell";
  buildInputs = with pkgs; [
    autoconf
    binutils
    cudatoolkit
    curl
    freeglut
    git
    gitRepo
    gnumake
    gnupg
    gperf
    libGLU_combined
    linuxPackages.nvidia_x11
    m4
    ncurses5
    procps
    stdenv.cc
    unzip
    utillinux
    xorg.libX11
    xorg.libXext
    xorg.libXi
    xorg.libXmu
    xorg.libXrandr
    xorg.libXv
    zlib
  ];
  shellHook = ''
    export CUDA_PATH=${pkgs.cudatoolkit}
    export EXTRA_LDFLAGS="-L/lib -L${pkgs.linuxPackages.nvidia_x11}/lib"
    export EXTRA_CCFLAGS="-I/usr/include"
  '';
}

Extra:

My current shell setups also includes the following .envrc with direnv.

2 Likes

Thanks! I’ll give that a try tonight. I can’t wait to get it working.

I think that NixOS could benefit greatly from having this better documented. I myself am going to be doing a lot of development with cuda soon.

Hmmmm, so I tried your nix-shell code above, and added pytorchWithCuda. I got a familiar error that I think you’ve run into before:

shrinking RPATHs of ELF executables and libraries in /nix/store/5fq22jqnzycs5ikfjvr7hdybw1lxv9jm-python3.6-pytorch-0.4.1                                                                                
shrinking /nix/store/5fq22jqnzycs5ikfjvr7hdybw1lxv9jm-python3.6-pytorch-0.4.1/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so                                                      
shrinking /nix/store/5fq22jqnzycs5ikfjvr7hdybw1lxv9jm-python3.6-pytorch-0.4.1/lib/python3.6/site-packages/torch/lib/libshm.so                                                                           
shrinking /nix/store/5fq22jqnzycs5ikfjvr7hdybw1lxv9jm-python3.6-pytorch-0.4.1/lib/python3.6/site-packages/torch/lib/libnccl.so                                                                          
shrinking /nix/store/5fq22jqnzycs5ikfjvr7hdybw1lxv9jm-python3.6-pytorch-0.4.1/lib/python3.6/site-packages/torch/lib/libcaffe2_gpu.so                                                                    
shrinking /nix/store/5fq22jqnzycs5ikfjvr7hdybw1lxv9jm-python3.6-pytorch-0.4.1/lib/python3.6/site-packages/torch/lib/libcaffe2.so                                                                        
shrinking /nix/store/5fq22jqnzycs5ikfjvr7hdybw1lxv9jm-python3.6-pytorch-0.4.1/lib/python3.6/site-packages/torch/lib/torch_shm_manager                                                                   
shrinking /nix/store/5fq22jqnzycs5ikfjvr7hdybw1lxv9jm-python3.6-pytorch-0.4.1/lib/python3.6/site-packages/torch/lib/libnccl.so.1                                                                        
shrinking /nix/store/5fq22jqnzycs5ikfjvr7hdybw1lxv9jm-python3.6-pytorch-0.4.1/lib/python3.6/site-packages/torch/_nvrtc.cpython-36m-x86_64-linux-gnu.so                                                  
shrinking /nix/store/5fq22jqnzycs5ikfjvr7hdybw1lxv9jm-python3.6-pytorch-0.4.1/lib/python3.6/site-packages/torch/_dl.cpython-36m-x86_64-linux-gnu.so                                                     
strip is /nix/store/b0yc8vswfzcanhdm6dgmfmdcgjmxvxa0-binutils-2.30/bin/strip
stripping (with command strip and flags -S) in /nix/store/5fq22jqnzycs5ikfjvr7hdybw1lxv9jm-python3.6-pytorch-0.4.1/lib                                                                                  
patching script interpreter paths in /nix/store/5fq22jqnzycs5ikfjvr7hdybw1lxv9jm-python3.6-pytorch-0.4.1                                                                                                
checking for references to /build in /nix/store/5fq22jqnzycs5ikfjvr7hdybw1lxv9jm-python3.6-pytorch-0.4.1...                                                                                             
RPATH of binary /nix/store/5fq22jqnzycs5ikfjvr7hdybw1lxv9jm-python3.6-pytorch-0.4.1/lib/python3.6/site-packages/torch/lib/libcaffe2_gpu.so contains a forbidden reference to /build                     
find: 'standard output': Broken pipe
find: write error
builder for '/nix/store/f45l98iq9mq8f5bk7wmxdd348ncry4j3-python3.6-pytorch-0.4.1.drv' failed with exit code 1                                                                                           
cannot build derivation '/nix/store/05vps3jd5dp8nv07yidz5lfwyqwf9i5g-python3-3.6.8-env.drv': 1 dependencies couldn't be built                                                                           
error: build of '/nix/store/05vps3jd5dp8nv07yidz5lfwyqwf9i5g-python3-3.6.8-env.drv' failed

I’d really like to contribute by debugging, but I’m not sure where to start! I see there’s a commit here that checks for links into temporary paths, and I can understand why that’s a bad thing to be avoided. However, I’m not sure what to do about it.

1 Like

I’m thinking the same, but was waiting until I could run this shell again and see if it really works. Takes a lot of time the get the deps on my slow af internet XD, edited the above post confirming it works on 19.03pre167858.f2a1a4e93be (Koi).

@ludflu I’ll try running the same shell with pytorchWithCuda added and see if I get the same result.

I’d use some binary viewer and find that string. You’ll see what it’s like, e.g. path to which file etc. For example, depending on how you call C compiler, __FILE__ literals (and thus assert() calls) will generate such references.

1 Like

This weekend I solved this problem by switching my whole system to unstable. I did it because I noticed that your fix for this was already merged into nixpkgs/master. (switching just the nvidia & pytorch modules to unstable didn’t seem to work) It did the trick! Thanks so much!

the only snag was a number of python failures about “ZIP does not support timestamps before 1980”. The fix for that is to unset SOURCE_DATE_EPOCH in the shellHook.

1 Like

Added a section on the wiki, also mentioning this thread.

Nvidia/CUDA

@ludflu Glad you made it work! :+1:

1 Like