Compiling Cuda Kernels with nvcc on nixos no driver found . cudaGetDriverVersion returns 0 . cudaRuntime Error 35 driver version insufficient

nixlearner · December 5, 2020, 1:56am

Background:

Hi I’m trying to develop some cuda kernels on nixos. I already had some working kernels on windows, when i tried it on nix they didn’t work anymore. After some debugging I found out that it’s due to “Error 35: Driver Version Insufficient for Runtime Version”. But the driver version is high enough as checked with nvidia-smi

Minimal reproducible example - get_versions.cu

#include <iostream>

int main()
{
        int run_version, driver_version;
        std::cout<< "Return Code Runtime Version: ";
        std::cout<<cudaRuntimeGetVersion(&run_version);
        std::cout<<"\nReturn Code Driver Version: ";
        std::cout<<cudaDriverGetVersion(&driver_version);
        std::cout << "\nRuntime Version: ";
        std::cout << run_version;
        std::cout << "\n Driver Version: ";
        std::cout << driver_version;
        return 0;
}

I compiled it with:

nvcc -o versions_bin get_versions.cu -I /nix/store/lxyjz3j1qbrf9hw0nnsdang3gk2a8wpp-cudatoolkit-11.0.3/include/ -ldir /nix/store/lxyjz3j1qbrf9hw0nnsdang3gk2a8wpp-cudatoolkit-11.0.3/nvvm/libdevice/ -L /nix/store/lxyjz3j1qbrf9hw0nnsdang3gk2a8wpp-cudatoolkit-11.0.3/lib/  -L /nix/store/sdjf499xfbykmr3lsyd7084krjxr3mfx-cudatoolkit-11.0.3-lib/lib/  --dont-use-profile

Output

$ ./versions_bin 
Return Code Runtime Version: 35
Return Code Driver Version: 0
Runtime Version: 0
Driver Version: 0

The driver is definitely there as nvidia-smi returns the following.

[user@nixos:]$ nvidia-smi 
Sat Dec  5 02:50:20 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.38       Driver Version: 455.38       CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 107...  Off  | 00000000:2A:00.0  On |                  N/A |
|  0%   51C    P0    34W / 180W |    936MiB /  8116MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1159      G   ...-xorg-server-1.20.8/bin/X      536MiB |
|    0   N/A  N/A      1666      G   ...nt-system/sw/bin/kwin_x11      235MiB |
|    0   N/A  N/A      1704      G   ...ce-5.18.5/bin/plasmashell      119MiB |
|    0   N/A  N/A      3669      G   ...AAAAAAAA== --shared-files       38MiB |
+-----------------------------------------------------------------------------+

So it seems for some reason the cuda program can not see the driver.

knedlsepp · December 5, 2020, 4:05am

On NixOS you’ll need to make sure that the resulting binary has an RPATH entry pointing to /run/opengl-driver/lib, which is where libcuda.so is located.
In nixpkgs this is done using the addOpenGLRunpath hook. You could try straceing to see if the libcuda is found.

nixlearner · December 6, 2020, 7:58am

Thank you @knedlsepp , hilarious user name btw.

Solution:

patchelf --set-rpath '/run/opengl-driver/lib:'$(patchelf --print-rpath versions_bin) versions_bin

nixlearner · January 20, 2021, 7:55pm

Just in case anyone needs it, this is now my working command i automatically build in my configuration.nix
Usage

cudaCompile source.cu

configuration.nix

let buildScript = ''nvcc -o "$1".bin "$1" -I ${pkgs.cudaPackages.cudatoolkit_11}/include -ldir ${pkgs.cudaPackages.cudatoolkit_11}/nvvm/libdevice/ -L ${pkgs.cudaPackages.cudatoolkit_11}/lib -L ${pkgs.cudaPackages.cudatoolkit_11.lib}/lib --dont-use-profile -G --std=c++11 -rdc=true -gencode=arch=compute_60,code=sm_60 -lcudadevrt   
patchelf --set-rpath "/run/opengl-driver/lib:"$(patchelf --print-rpath "$1".bin) "$1".bin
'';
cudaCompile = pkgs.writeScriptBin "cudaCompile" ''
    #!${pkgs.stdenv.shell}
    ${buildScript}
  '';

{
      environment.systemPackages = with pkgs; [ cudaCompile .....]
}