I have a Ryzen 5700U, and I would like to know if its GPU can be used to do compute-on-GPU stuff, and if so how to configure NixOS to let that happen.
I don’t have a specific use case in mind; just trying to get a sense of what my capabilities are or if this is a ridiculous thing to attempt. I’m not really familiar with all of the layers of the 2024 compute-on-GPU stack, so I don’t really know what to set up or what to test, but ideally someone could recommend to me an application that:
has a benchmark mode or another easy way to test if compute-on-GPU is working,
is small enough in itself and in its dependencies that the number of ways it could be broken isn’t overwhelming,
is likely to work with an APU (‘officially supported’ or not), and
is packaged for NixOS.
Currently, rocminfo runs and tells me things about my CPU and GPU agents. clinfo partially runs and prints some things about my GPU but hangs halfway through. Attempting nix-shell -p python311Packages.torchWithRocm results in a build error complaining about hipblaslt, and python311Packages.torch doesn’t seem to have been compiled with the right stuff. hashcat has a benchmark mode but any use, including hashcat -I, hangs at startup. I don’t know if any of these things are bugs, configuration errors, or signs that I shouldn’t be trying any of this on my hardware.
I’ve added nixos.config.rocmSupport = true; to my configuration file, and imported <nixos-hardware/common/gpu/amd>, which looks like it sets up hardware.opengl.extraPackages to contain ROCm stuff. I’m using the amdgpu driver. I’ve rebooted a few times.
Anyone have a suggestion for what test program I should aim for, or other things to attempt in my configuration to make it work?
Just had a weird experience reproducing my original results: clinfo didn’t hang, but then hashcat -I made my display blink on and off a few times and then crashed my GNOME session. Upon logging back in, clinfo is back to hanging, and hashcat -I still hangs.
Hey rhendric! If you want to try something for ROCm, try running ollama maybe? But since AMD APUs are not officially supported by ROCm you will have to override the environment variable for the GPU architecture so that ROC thinks you are running something else that is supported.
Thanks for the pointer; it’s not obvious to me how to use ollama in a benchmark-y way, but I’ve given it a go. Stock ollama from Nixpkgs seems not to want to use the GPU, based on the log spew from ollama start; seems I don’t have enough reserved VRAM (my BIOS doesn’t let me change this) and ollama doesn’t recognize the ~32 GiB of GTT available (according to amdgpu_top).
I tried working my way through this issue, and I added the OLLAMA_VRAM_OVERRIDE patch included there, but then ollama run seems to timeout after downloading a model and before dropping me into a prompt. I haven’t yet tried the force-host-allocation-APU library mentioned there, which might be relevant if the cause of the timeout is that the GTT isn’t being used.
Thanks… what am I looking at, though? I don’t see anything there that looks like a program I can run outside of whatever singularity is? Can I just nix-shell -p zluda cudaPackages.saxpy and run… something?
You can try adding both pocl and rocmPackages.clr.icd OpenCL backends to your hardware.graphics.extraPackages. The former should allow you to use your CPU and the latter is for the integrated GPU/discrete GPU. I am using Ryzen 7600 desktop processor and both show up in clinfo.
Thanks for the pointer; it’s not obvious to me how to use ollama in a benchmark-y way, but I’ve given it a go. Stock ollama from Nixpkgs seems not to want to use the GPU, based on the log spew from ollama start ; seems I don’t have enough reserved VRAM (my BIOS doesn’t let me change this) and ollama doesn’t recognize the ~32 GiB of GTT available (according to amdgpu_top ).
Yes, if your BIOS doesn’t let you allocate VRAM that might be unfortunate. That being said, ollama runs with 2GB of VRAM on my older Nvidia laptop. So usually I’d say it should work with a small model. You could try to run phi3 for instance, that is fairly small, only slightly above 2GB. ollama run phi3
Thanks for the pointer; it’s not obvious to me how to use ollama in a benchmark-y way
ollama would give you information about generated tokens per second when running a specific model. You could compare those numbers. It’s not purpose built for benchmarking of course.
When it comes to environment flags, the most important one will be HSA_OVERRIDE_GFX_VERSION which you might have to set to HSA_OVERRIDE_GFX_VERSION=9.0.0 because your iGPU has a Vega architecture.
(For context: the number represents the LLVM build targets for ROCm. And you need to set it to something that is fairly similar to your actualy GPU and is officially supported by ROCm)