Hi everyone!
tl;dr: as of now, running ollama-vulkan is the best option on Strix Halo.
I wanted to open this topic to discuss, how to run models on Ollama on the new Strix Halo CPU/GPU, as we don’t have any topic here yet. I would like to hear about your configurations, discuss the current status quo, and share what I’ve tried so far.
I’m personally interested in running small models as fast as possible, and spoilers, I’m doing a terrible job at it ![]()
My laptop setup:
Kernel - 6.18.5
Distro - NixOS 26.05 (Yarara) [unstable actually]
DE - KDE
CPU - AMD Ryzen AI 9 HX 370 w/ Radeon 890M (24)
Memory - 7.4 GB / 131.0 GB
Power - 100W
My ollama (0.13.5) config:
services.ollama = {
enable = true;
package = pkgs.ollama-rocm; # or set pkgs.ollama-vulkan
loadModels = [
"ministral-3:14b"
"ministral-3:8b"
];
rocmOverrideGfx = "11.5.1";
environmentVariables = {
# Hopefully helps with offloading layers to GPU, it didn't
HSA_ENABLE_SDMA = "0";
OLLAMA_DEBUG = "1";
};
};
I noticed that models like ministral-3:8b were going slow, around 11.72 tokens/s, and thus, the rabbit-hole began. An online calculator says that it should be ~30 tokens/s, I haven’t been able to find proper benchmark unfortunately.
I started with ollama-rocm, and I remember reading a random comment, on reddit: “just use vulkan”. So I switched to ollama-vulkan and believe it or not, it ended up being the fastest, I continued testing different rocm options, but still, they all end up slow. In theory, rocm should be faster, as vulkan is a more generic solution, but in reality, rocm on strix halo, is very green.
These are my benchmark notes:
| Backend | Configuration / Overrides | Eval Rate |
|---|---|---|
| ollama-vulkan | Default | 14.11 tokens/s |
| ollama-rocm | rocmOverrideGfx = "11.0.2" |
12.73 tokens/s |
| ollama-rocm | rocmOverrideGfx = "11.5.0" |
12.65 tokens/s |
| ollama-rocm | rocmOverrideGfx = "11.5.1" (SDMA disabled) |
12.60 tokens/s |
| ollama-rocm | rocmOverrideGfx = "11.5.0" (SDMA disabled) |
12.38 tokens/s |
| ollama-rocm | rocmOverrideGfx = "11.0.0" (SDMA disabled) |
12.17 tokens/s |
| ollama-rocm | rocmOverrideGfx = "11.0.3" |
11.88 tokens/s |
| ollama-rocm | rocmOverrideGfx = "11.0.0" |
11.72 tokens/s |
I’ve been running this command, which outputs a report at the end:
ollama run ministral-3:8b \
"In 2 paragraphs explain what journalctl is and what it does, no examples" \
--verbose
And, what I noticed, by looking at the ollama logs
sudo journalctl -u ollama -f
Is that the ollama-rocm somehow fails to “offload layers to the GPU”
msg="insufficient VRAM to load any model layers"
msg="new layout created" layers=[]
msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:12 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
msg="model weights" device=CPU size="5.6 GiB"
msg="kv cache" device=CPU size="544.0 MiB"
msg="compute graph" device=CPU size="765.1 MiB"
msg="total memory" size="6.9 GiB"
msg="loaded runners" count=1
msg="offloading 0 repeating layers to GPU"
msg="offloading output layer to CPU"
msg="offloaded 0/35 layers to GPU"
After some investigation, it looks like the AMD 890m is not supported by ollama yet, although some people suggest changing the VRAM assigned to the card in the BIOS (I’m gonna test this next, and update the post I couldn’t do it on my BIOS) .
I’ve also seen other tools, which are not available in nixpkgs yet, which, allegedly, support Strix Halo: lemonade-server, amd/gaia and FastFlowLM / FastFlowLM
Some related resources:
- [ENHANCE] Add Ubuntu Support for AMD Ryzen AI 9 HX 370 w/ Radeon 890M (gfx1150) · Issue #9999 · ollama/ollama · GitHub
- offloaded 0/35 layers to GPU on gfx1103 · Issue #12303 · ollama/ollama · GitHub
- Hardware support - Ollama
- AMD-specific Ollama Alternative? - #8 by Keyvan - Framework Desktop - Framework Community
- Status of AMD NPU Support - Linux - Framework Community
- Quickstart Guide: Ollama With GPU Support (No ROCM Needed) - Linux - Framework Community
Hopefully, with the outcomes of this thread, we can update the wiki.
To close the long post, I would like to hear experiences from the community.
Thanks!

