Hi all ![]()
I have been attempting to use my new strix halo (128Gb unified memory) machine to run AI models. So far, I have only been able to get models running with the vulkan backend. I was wondering if anyone has figured out a way to run with rocm or vllm (mainly rocm). I keep seeing online that there were issues with an older kernel issue, but kernel 6.18.4 fixed the issue. I am currently on 6.19.3 and no such luck ![]()
As an example on how I got a gguf model working using vulkan, I have the following (I leave it here so maybe others can suggest quick tweaks to what I already have to get rocm working):
{ config, lib, pkgs, ... }:
with lib;
let
cfg = config.services.llm-services.gpt-oss;
in
{
options.services.llm-services.gpt-oss = {
enable = mkEnableOption "QwQ-32B Reasoning Service (Port 8013)";
modelPath = mkOption {
type = types.str;
default = "/var/lib/llama-cpp-models/qwq_32b_q4km.gguf";
description = "Path to the QwQ-32B GGUF model.";
};
};
config = mkIf cfg.enable {
systemd.services.llama-cpp-reasoning = {
description = "LLaMA C++ server (Reasoning - QwQ-32B)";
after = [ "network.target" ];
wantedBy = [ "multi-user.target" ];
environment = {
XDG_CACHE_HOME = "/var/cache/llama-cpp-reasoning";
RADV_PERFTEST = "aco";
AMD_VULKAN_ICD = "RADV";
# Inject Lemonade Runtime Libs (as per working config)
LD_LIBRARY_PATH = lib.makeLibraryPath [
pkgs.rocmPackages.clr
pkgs.vulkan-loader
pkgs.libdrm
];
};
serviceConfig = {
User = "salhashemi2";
Group = "users";
CacheDirectory = "llama-cpp-reasoning";
RuntimeDirectory = "llama-cpp-reasoning";
DeviceAllow = [ "/dev/dri/renderD128" "/dev/dri/card0" "/dev/kfd" ];
PrivateDevices = false;
ExecStart = "${pkgs.llama-cpp.override { vulkanSupport = true; }}/bin/llama-server --model ${cfg.modelPath} --port 8013 --host 0.0.0.0 --n-gpu-layers 65 --cache-type-k q8_0 --cache-type-v q8_0 --ctx-size 32768 --jinja --threads 16 --device Vulkan0 --flash-attn 1 --no-mmap --parallel 1";
ExecStartPre = "${pkgs.coreutils}/bin/sleep 2";
Restart = "on-failure";
RestartSec = "5s";
};
};
};
}
Any help/discussion on the topic is appreciated
Exciting times we live in!