Using non-vulkan inference engine on amd strix halo

Hi all :slight_smile:

I have been attempting to use my new strix halo (128Gb unified memory) machine to run AI models. So far, I have only been able to get models running with the vulkan backend. I was wondering if anyone has figured out a way to run with rocm or vllm (mainly rocm). I keep seeing online that there were issues with an older kernel issue, but kernel 6.18.4 fixed the issue. I am currently on 6.19.3 and no such luck :slight_smile:

As an example on how I got a gguf model working using vulkan, I have the following (I leave it here so maybe others can suggest quick tweaks to what I already have to get rocm working):

{ config, lib, pkgs, ... }:

with lib;

let
  cfg = config.services.llm-services.gpt-oss;
in
{
  options.services.llm-services.gpt-oss = {
    enable = mkEnableOption "QwQ-32B Reasoning Service (Port 8013)";
    modelPath = mkOption {
      type = types.str;
      default = "/var/lib/llama-cpp-models/qwq_32b_q4km.gguf";
      description = "Path to the QwQ-32B GGUF model.";
    };
  };

  config = mkIf cfg.enable {
    systemd.services.llama-cpp-reasoning = {
      description = "LLaMA C++ server (Reasoning - QwQ-32B)";
      after = [ "network.target" ];
      wantedBy = [ "multi-user.target" ];
      environment = {
        XDG_CACHE_HOME = "/var/cache/llama-cpp-reasoning";
        RADV_PERFTEST = "aco";
        AMD_VULKAN_ICD = "RADV";
        # Inject Lemonade Runtime Libs (as per working config)
        LD_LIBRARY_PATH = lib.makeLibraryPath [
          pkgs.rocmPackages.clr
          pkgs.vulkan-loader
          pkgs.libdrm
        ];
      };
      serviceConfig = {
        User = "salhashemi2";
        Group = "users";
        CacheDirectory = "llama-cpp-reasoning";
        RuntimeDirectory = "llama-cpp-reasoning";
        DeviceAllow = [ "/dev/dri/renderD128" "/dev/dri/card0" "/dev/kfd" ];
        PrivateDevices = false;
        ExecStart = "${pkgs.llama-cpp.override { vulkanSupport = true; }}/bin/llama-server --model ${cfg.modelPath} --port 8013 --host 0.0.0.0 --n-gpu-layers 65 --cache-type-k q8_0 --cache-type-v q8_0 --ctx-size 32768 --jinja --threads 16 --device Vulkan0 --flash-attn 1 --no-mmap --parallel 1";
        ExecStartPre = "${pkgs.coreutils}/bin/sleep 2";
        Restart = "on-failure";
        RestartSec = "5s";
      };
    };
  };
}

Any help/discussion on the topic is appreciated :slight_smile: Exciting times we live in!

2 Likes

Following because I’m trying to do the same.

Use pkgs.pkgsRocm.llama-cpp, it should work without other flags, don’t set extra LD_LIBRARY_PATH values.

If it’s not working, in what way is it failing and what nixpkgs revision are you on?