For the life of me I can’t figure out how to get CUDA stuff working within a nix-shell. Any help would be greatly appreciated! I have a Ubuntu 20.04 machine with a GTX 1060 gpu:
❯ nvidia-smi
Sat Aug 8 23:15:19 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.100 Driver Version: 440.100 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
However, I just can’t get nvidia-smi working from within a nix-shell. Following the directions in the docs here (Nvidia - NixOS Wiki), I set up the following nvidiabug.nix:
I notice that in that example there are pinned CUDA versions (cudatoolkit_10_1). Does this relate somehow to the installed driver version you have? Why not just use cudatoolkit?
Just for learning purposes, how could I get the nvidia-smi example working? I’m not even sure how to find a package that includes the nvidia-smi binary.
Ha, that’s funny. I was just playing around with difftaichi/taichi elements for a project I’m working on. I think 10.1 is supported fine (that’s what I’m using). nvidia-smi should be installed by the nvidia-drivers; is there a reason you need to run it from within nix-shell?
For what it’s worth, I have nvidia-smi inside and outside of nix-shell so long as I don’t evaluate the shell.nix with the --pure flag. If you want it in a pure environment, you can also forward the external path (thus breaking purity)
System drivers aren’t typically packaged into shell derivation (and to be honest, I’m not sure how that would work), which is what provides the nvidia-smi command.
I’m on nixpkgs unstable, it should be in pkgs. Do you have unfree and cuda enabled in your nixpkgs config?
❯ nix search cudatoolkit_10_1
warning: using cached results; pass '-u' to update the cache
* nixpkgs.cudaPackages.cudatoolkit_10_1 (cudatoolkit-10.1.243)
A compiler for NVIDIA GPUs, math libraries, and tools
* nixpkgs.cudatoolkit_10_1 (cudatoolkit-10.1.243)
A compiler for NVIDIA GPUs, math libraries, and tools
* nixpkgs.cudnn_cudatoolkit_10_1 (cudatoolkit-10.1-cudnn-7.6.3)
NVIDIA CUDA Deep Neural Network library (cuDNN)
❯ nix repl
Welcome to Nix version 2.3.7. Type :? for help.
nix-repl> :l <nixpkgs>
Added 12152 variables.
nix-repl> pkgs.cudatoolkit_10_1
«derivation /nix/store/233nfzcl7fc57j8bgmhhdiizkvypja3m-cudatoolkit-10.1.243.drv»
Thanks @brogos! I just tried running with your shell.nix, but I’m still getting the same error. I also tried cudatoolkit_10_2 instead of 10.1 since nvidia-smi says my system is on 10.2, but still no luck:
[Taichi] Starting on arch=cuda
[E 08/10/20 12:57:31.386] [cuda_driver.h:operator()@71] CUDA Error CUDA_ERROR_COMPAT_NOT_SUPPORTED_ON_DEVICE: forward compatibility was attempted on non supported HW while calling init (cuInit)
***********************************
* Taichi Compiler Stack Traceback *
***********************************
/home/skainswo/dev/difftaichi/.venv-nix/lib/python3.8/site-packages/taichi/core/../lib/taichi_core.so: taichi::Logger::error(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)
/home/skainswo/dev/difftaichi/.venv-nix/lib/python3.8/site-packages/taichi/core/../lib/taichi_core.so: taichi::lang::CUDADriverFunction<int>::operator()(int)
/home/skainswo/dev/difftaichi/.venv-nix/lib/python3.8/site-packages/taichi/core/../lib/taichi_core.so: taichi::lang::CUDAContext::CUDAContext()
/home/skainswo/dev/difftaichi/.venv-nix/lib/python3.8/site-packages/taichi/core/../lib/taichi_core.so: taichi::lang::CUDAContext::get_instance()
/home/skainswo/dev/difftaichi/.venv-nix/lib/python3.8/site-packages/taichi/core/../lib/taichi_core.so: taichi::lang::RuntimeCUDA::detected()
/home/skainswo/dev/difftaichi/.venv-nix/lib/python3.8/site-packages/taichi/core/../lib/taichi_core.so: taichi::lang::Program::Program(taichi::lang::Arch)
/home/skainswo/dev/difftaichi/.venv-nix/lib/python3.8/site-packages/taichi/core/../lib/taichi_core.so(+0x6e25c9) [0x7fab6da745c9]
/home/skainswo/dev/difftaichi/.venv-nix/lib/python3.8/site-packages/taichi/core/../lib/taichi_core.so(+0x5b16b0) [0x7fab6d9436b0]
/nix/store/cfnpsvmkdldbd07gsf9ar2gl49qbmhim-python3-3.8.5/lib/libpython3.8.so.1.0: PyCFunction_Call
/nix/store/cfnpsvmkdldbd07gsf9ar2gl49qbmhim-python3-3.8.5/lib/libpython3.8.so.1.0: _PyObject_MakeTpCall
/nix/store/cfnpsvmkdldbd07gsf9ar2gl49qbmhim-python3-3.8.5/lib/libpython3.8.so.1.0(+0x7bf4e) [0x7fab716cbf4e]
/nix/store/cfnpsvmkdldbd07gsf9ar2gl49qbmhim-python3-3.8.5/lib/libpython3.8.so.1.0: PyVectorcall_Call
/nix/store/cfnpsvmkdldbd07gsf9ar2gl49qbmhim-python3-3.8.5/lib/libpython3.8.so.1.0(+0x167ce6) [0x7fab717b7ce6]
/nix/store/cfnpsvmkdldbd07gsf9ar2gl49qbmhim-python3-3.8.5/lib/libpython3.8.so.1.0(+0x16572d) [0x7fab717b572d]
/nix/store/cfnpsvmkdldbd07gsf9ar2gl49qbmhim-python3-3.8.5/lib/libpython3.8.so.1.0: _PyObject_MakeTpCall
/nix/store/cfnpsvmkdldbd07gsf9ar2gl49qbmhim-python3-3.8.5/lib/libpython3.8.so.1.0: _PyEval_EvalFrameDefault
/nix/store/cfnpsvmkdldbd07gsf9ar2gl49qbmhim-python3-3.8.5/lib/libpython3.8.so.1.0: _PyFunction_Vectorcall
/nix/store/cfnpsvmkdldbd07gsf9ar2gl49qbmhim-python3-3.8.5/lib/libpython3.8.so.1.0: _PyEval_EvalFrameDefault
/nix/store/cfnpsvmkdldbd07gsf9ar2gl49qbmhim-python3-3.8.5/lib/libpython3.8.so.1.0: _PyEval_EvalCodeWithName
/nix/store/cfnpsvmkdldbd07gsf9ar2gl49qbmhim-python3-3.8.5/lib/libpython3.8.so.1.0: _PyFunction_Vectorcall
/nix/store/cfnpsvmkdldbd07gsf9ar2gl49qbmhim-python3-3.8.5/lib/libpython3.8.so.1.0: _PyEval_EvalFrameDefault
/nix/store/cfnpsvmkdldbd07gsf9ar2gl49qbmhim-python3-3.8.5/lib/libpython3.8.so.1.0: _PyEval_EvalCodeWithName
/nix/store/cfnpsvmkdldbd07gsf9ar2gl49qbmhim-python3-3.8.5/lib/libpython3.8.so.1.0: PyEval_EvalCodeEx
/nix/store/cfnpsvmkdldbd07gsf9ar2gl49qbmhim-python3-3.8.5/lib/libpython3.8.so.1.0: PyEval_EvalCode
/nix/store/cfnpsvmkdldbd07gsf9ar2gl49qbmhim-python3-3.8.5/lib/libpython3.8.so.1.0(+0x21bdc8) [0x7fab7186bdc8]
/nix/store/cfnpsvmkdldbd07gsf9ar2gl49qbmhim-python3-3.8.5/lib/libpython3.8.so.1.0(+0x21bd63) [0x7fab7186bd63]
/nix/store/cfnpsvmkdldbd07gsf9ar2gl49qbmhim-python3-3.8.5/lib/libpython3.8.so.1.0: PyRun_StringFlags
/nix/store/cfnpsvmkdldbd07gsf9ar2gl49qbmhim-python3-3.8.5/lib/libpython3.8.so.1.0: PyRun_SimpleStringFlags
/nix/store/cfnpsvmkdldbd07gsf9ar2gl49qbmhim-python3-3.8.5/lib/libpython3.8.so.1.0: Py_RunMain
/nix/store/cfnpsvmkdldbd07gsf9ar2gl49qbmhim-python3-3.8.5/lib/libpython3.8.so.1.0: Py_BytesMain
/nix/store/aqq6367snc1zh3fs1pc4j4zm5h80vkkz-glibc-2.31/lib/libc.so.6: __libc_start_main
python3(_start+0x2a) [0x40107a]
Internal Error occurred, check this page for possible solutions:
https://taichi.readthedocs.io/en/stable/install.html#troubleshooting
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/skainswo/dev/difftaichi/.venv-nix/lib/python3.8/site-packages/taichi/lang/__init__.py", line 209, in init
ti.get_runtime().create_program()
File "/home/skainswo/dev/difftaichi/.venv-nix/lib/python3.8/site-packages/taichi/lang/impl.py", line 200, in create_program
self.prog = taichi_lang_core.Program()
RuntimeError: [cuda_driver.h:operator()@71] CUDA Error CUDA_ERROR_COMPAT_NOT_SUPPORTED_ON_DEVICE: forward compatibility was attempted on non supported HW while calling init (cuInit)
I’m actually a little bit confused how it’s not working for me, but the exact same shell.nix is working on @brogos system. Isn’t nix supposed to guarantee reproducibility and all that good stuff?