CUDA in NixOS on GCP for a Tesla K80

Anyone already suceeded setting up CUDA for a Tesla K80 on GCP?

I initially used the default driver but didn’t recognize it, then downgraded to 470.x legacy as said on the Nvidia site [1].

After downgrading the driver, pretty easy task though, it started showing up in nvidia-smi but when I tried running blender no GPU was found.

I know that the nixpkgs build of blender doesn’t support CUDA out of the box and I am already using blender-bin from edolstra’s nix-warez.

CUDA samples (deviceQuery and deviceQueryDrv) also recognize the GPU but not CUDA, the system I am using to detect the GPU in Blender was already tested in Google Colab and another machine with Ubuntu and works fine.

Later I tried downgrading cudatoolkit and discovered that some stuff misses libs from cudatoolkit.lib. I’ve tried already cudatoolkit, nvidiaPackages_11_4.cudatoolkit and nvidiaPackages_10.cudatoolkit.

That’s my VPS config [2] and these are my terraform dotfiles [3].

Extra context:
cuda-samples using latest cudatoolkit

[nix-shell:~/WORKSPACE/cuda-samples/Samples/1_Utilities/deviceQuery]$ ./deviceQuery 
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 35
-> CUDA driver version is insufficient for CUDA runtime version
Result = FAIL

[nix-shell:~/WORKSPACE/cuda-samples/Samples/1_Utilities/deviceQueryDrv]$ ./deviceQueryDrv 
./deviceQueryDrv Starting...

CUDA Device Query (Driver API) statically linked version 
Detected 1 CUDA Capable device(s)

Device 0: "Tesla K80"
  CUDA Driver Version:                           11.4
  CUDA Capability Major/Minor version number:    3.7
  Total amount of global memory:                 11441 MBytes (11997020160 bytes)
  (13) Multiprocessors, (192) CUDA Cores/MP:     2496 CUDA Cores
  GPU Max Clock rate:                            824 MHz (0.82 GHz)
  Memory Clock rate:                             2505 Mhz
  Memory Bus Width:                              384-bit
  L2 Cache Size:                                 1572864 bytes
  Max Texture Dimension Sizes                    1D=(65536) 2D=(65536, 65536) 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size (x,y,z):    (2147483647, 65535, 65535)
  Texture alignment:                             512 bytes
  Maximum memory pitch:                          2147483647 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Concurrent kernel execution:                   Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Enabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Managed Memory:                Yes
  Device supports Compute Preemption:            No
  Supports Cooperative Kernel Launch:            No
  Supports MultiDevice Co-op Kernel Launch:      No
  Device PCI Domain ID / Bus ID / location ID:   0 / 0 / 4
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
Result = PASS

[1] https://www.nvidia.com.br/Download/driverResults.aspx/188594/br
[2] https://github.com/lucasew/nixcfg/blob/89620fe0a6b50b1185b7b36b6fbdebb5ad758f67/nodes/vps/default.nix
[3] https://github.com/lucasew/nixcfg/blob/89620fe0a6b50b1185b7b36b6fbdebb5ad758f67/infra/gcp.tf

(import nixpkgs { config.cudaSupport = true; config.allowUnfree = true; }).blender works with cuda just fine

~/WORKSPACE/cuda-samples/Samples/1_Utilities/deviceQueryDrv

How are these built? Do they know to look in /run/opengl-driver/lib for the userspace driver?

[2] https://github.com/lucasew/nixcfg/blob/89620fe0a6b50b1185b7b36b6fbdebb5ad758f67/nodes/vps/default.nix

I also couldn’t find where hardware.opengl.enable is being set

nix-shell -p gnumake
make

it’s more convenient to just use blender-bin

If you patchelf --print-rpath ./deviceQueryDrv, you’ll likely observe that /run/opengl-driver/lib isn’t in the list. Chances are, the absolute path to the driver isn’t there either. The /run/... path wouldn’t even exist unless you have hardware.opengl.enable = true set.

You can first test if that’s really the problem by running LD_LIBRARY_PATH="$path_to_nvidia/lib" ./deviceQueryDrv where path_to_nvidia would be the path to your driver, the one you export here

it’s more convenient to just use blender-bin

Except for exceptions, it’s easier to use nix-built software on NixOS:) Maybe the derivation is worth a try

It seems that this hardware.opengl.enable = true weirdly solves the problem.

it’s weird because CUDA is another thing compared to OpenGL but for some reason, it works.

Thank you.

I think I can now scale up to a beefier GPU if I want hehe.

If someone wants to play with, my terraform setup has a turbo mode thing that scales up to a beefier machine but when you are setting up stuff you can stick with a free e2-micro. Using this technique is hella cheap to iterate into.

it’s weird because CUDA is another thing compared to OpenGL but for some reason, it works

Yes, it’s just a legacy naming…
What that option does is it exposes the hardware-dependent user-space drivers in /run/opengl-driver/lib, including libcuda.so.

The naming issues will be resolved by WIP nixos/opengl: move to hardware.drivers by jonringer · Pull Request #158079 · NixOS/nixpkgs · GitHub or one of the parallel PRs

The changes are waiting on [RFC 0121] Migrate OpenGL References to API-Agnostic Terms by jonringer · Pull Request #121 · NixOS/rfcs · GitHub to be ratified.

2 Likes