Config to make llama.cpp offload to GPU (amdgpu/rocm)

Could someone please share their configuration to get llama.cpp to offload layers to gpu (amdgpu/rocm)

This should be enough
in NixOS configuration mouile

{
  services.ollama = {
    enable = true;
    acceleration = "rocm";
  };
}

or package override

{ pkgs ? import <nixpkgs> {} }:
pkgs.ollama.override {
  acceleration = "rocm";
}

note it might take a while to build ollama with acceleration

this is for ollama which is working fine with rocm, im looking to make llama.cpp work with rocm instead :smiley:

that would be the following override

{ pkgs ? import <nixpkgs> {} }:
pkgs.llama-cpp.override {
  rocmSupport = true;
}

so I tried this with the stable branch but did not work, what did work was getting the packages/dependecies from unstable-small branch instead :smiley: , is this the same experience for you ?

let

unstableSmall = import <nixosUnstableSmall> { config = { allowUnfree = true; }; };

in

    services.llama-cpp = {
      enable = true;
      package = unstableSmall.llama-cpp.override { rocmSupport = true; };
      model = "/var/lib/llama-cpp/models/qwen2.5-coder-32b-instruct-q4_0.gguf";
      host = "";
      port = "";
      extraFlags = [
                     "-ngl"
                     "64"
                   ];
      openFirewall = true;
    };

I didn’t try to build the package. I found llama-cpp package on search.nixos.org, went into the package sources and I noticed there is package argument rocmSupport, thus override