Hi folks,
I’m having a fine old time trying to get a package to compile against CUDA.
This would be an easy bucket for somebody, but I’ve already seen Compiling Cuda Kernels with nvcc on nixos no driver found . cudaGetDriverVersion returns 0 . cudaRuntime Error 35 driver version insufficient and I’ve used addOpenGLRunpath
to add the rpath to everything in the “out” output in both postBuild and postFixup (I first run autoPatchelf, then that). Alas, I still get the error when its tests are run.
So although I’m pretty sure the error is lying to me (I’m on NVIDIA driver version 525.116.04 and it is marked as usable with CUDA 11.6), I suppose I have to be sure.
The project I’m messing with is onnxruntime
, a massive pile from MS which has a 1.13.1 release in nixpkgs but I need 1.14.1, and I need it built with “tensorrt” support, which the nixpkgs one is not. I’ve managed to get it compiling seemingly fine, but demonstrates the error when one of the tests that uses CUDA is executed despite all my elfpatching.
The tests are kicked off via a command something like /nix/store/fqfi0m3fw3szj3n99r5n359579808bh6-cmake-3.25.3/bin/ctest --force-new-ctest-process
. What I’d like to do is strace the offending test process to see if it actually does find libcuda.so.1
(the error, maddeningly, is apparently the same whether the driver library is not found or mismatching). But I’m not sure a) how to inject the strace into the ctest invocation b) whether the strace will work given that ctest appears to want to create new processes for each test.
So my question is: does someone with cmake-fu and nix-fu have any suggestions about how to put an strace in here so I can see what’s happening?
The nix derivation I’m hacking on is at .nixconfig/common/obs-backgroundremoval/stripped-onnxruntime.nix at 4fe7b64175e7d071721997e1975623fcb3a4883f · mcdonc/.nixconfig · GitHub and it contains the meat of one of the errors at the top in a comment.
Thanks for any thoughts!
- C