People are on PyTorch/Tensorflow/etc. on CUDA. It’s great that there are alternatives, but it’s useless to tell people who are heavily invested in an existing framework that there is an alternative. If we can’t support the libraries that they use + CUDA, then they will just go somewhere else.
People will only switch if it supports their existing frameworks and the alternative to CUDA as an option that works out of the box and is in the same ballpark performance-wise.
ROCm/HIP on AMD GPUs/accelerators probably comes closest, but anyone who has used those in the real-world will tell you that it is a miserable experience. You’ll encounter lots and lots of bugs, bug reports apparently go to
/dev/null, and there are a lot of performance regression compared to NVIDIA. My previous employer purchased some Radeon VIIs. But we ran into so many issues that even the ardent AMD fans in the group were swearing by our NVIDIA GPUs.
I think we should consider setting up sort-of-a SIG that collects money (or finds sponsors) to rent a build server and provide an additional binary cache for scientific computing. I am willing to put time in this, but I know that @tbenst and @zimbatm have done prior work on a scientific computing overlay, so I am hoping that they’ll also give weigh in on this topic.