Anyone already suceeded setting up CUDA for a Tesla K80 on GCP?
I initially used the default driver but didn’t recognize it, then downgraded to 470.x legacy as said on the Nvidia site [1].
After downgrading the driver, pretty easy task though, it started showing up in nvidia-smi but when I tried running blender no GPU was found.
I know that the nixpkgs build of blender doesn’t support CUDA out of the box and I am already using blender-bin from edolstra’s nix-warez.
CUDA samples (deviceQuery and deviceQueryDrv) also recognize the GPU but not CUDA, the system I am using to detect the GPU in Blender was already tested in Google Colab and another machine with Ubuntu and works fine.
Later I tried downgrading cudatoolkit and discovered that some stuff misses libs from cudatoolkit.lib. I’ve tried already cudatoolkit, nvidiaPackages_11_4.cudatoolkit and nvidiaPackages_10.cudatoolkit.
That’s my VPS config [2] and these are my terraform dotfiles [3].
Extra context:
cuda-samples using latest cudatoolkit
[nix-shell:~/WORKSPACE/cuda-samples/Samples/1_Utilities/deviceQuery]$ ./deviceQuery
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
cudaGetDeviceCount returned 35
-> CUDA driver version is insufficient for CUDA runtime version
Result = FAIL
[nix-shell:~/WORKSPACE/cuda-samples/Samples/1_Utilities/deviceQueryDrv]$ ./deviceQueryDrv
./deviceQueryDrv Starting...
CUDA Device Query (Driver API) statically linked version
Detected 1 CUDA Capable device(s)
Device 0: "Tesla K80"
CUDA Driver Version: 11.4
CUDA Capability Major/Minor version number: 3.7
Total amount of global memory: 11441 MBytes (11997020160 bytes)
(13) Multiprocessors, (192) CUDA Cores/MP: 2496 CUDA Cores
GPU Max Clock rate: 824 MHz (0.82 GHz)
Memory Clock rate: 2505 Mhz
Memory Bus Width: 384-bit
L2 Cache Size: 1572864 bytes
Max Texture Dimension Sizes 1D=(65536) 2D=(65536, 65536) 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Texture alignment: 512 bytes
Maximum memory pitch: 2147483647 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Concurrent kernel execution: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Enabled
Device supports Unified Addressing (UVA): Yes
Device supports Managed Memory: Yes
Device supports Compute Preemption: No
Supports Cooperative Kernel Launch: No
Supports MultiDevice Co-op Kernel Launch: No
Device PCI Domain ID / Bus ID / location ID: 0 / 0 / 4
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
Result = PASS
[1] Data Center Driver for Linux x64 | 470.129.06 | Linux 64-bit | NVIDIA
[2] https://github.com/lucasew/nixcfg/blob/89620fe0a6b50b1185b7b36b6fbdebb5ad758f67/nodes/vps/default.nix
[3] https://github.com/lucasew/nixcfg/blob/89620fe0a6b50b1185b7b36b6fbdebb5ad758f67/infra/gcp.tf