Nvidia offload with Ollama not recognising GPU

rytswd · May 7, 2025, 12:32pm

Hi folks – I have not been able to figure out Ollama setup using nvidia-offload. When I run nvidia-offload ollama serve, it says no compatible GPUs detected.

nvidia-offload ollama serve
2025/05/07 13:20:15 routes.go:1231: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:2048 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/ryota/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2025-05-07T13:20:15.852+01:00 level=INFO source=images.go:458 msg="total blobs: 0"
time=2025-05-07T13:20:15.852+01:00 level=INFO source=images.go:465 msg="total unused blobs removed: 0"
time=2025-05-07T13:20:15.853+01:00 level=INFO source=routes.go:1298 msg="Listening on 127.0.0.1:11434 (version 0.6.5)"
time=2025-05-07T13:20:15.854+01:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
time=2025-05-07T13:20:15.863+01:00 level=WARN source=amd_linux.go:61 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2025-05-07T13:20:15.863+01:00 level=INFO source=amd_linux.go:296 msg="unsupported Radeon iGPU detected skipping" id=0 total="512.0 MiB"
time=2025-05-07T13:20:15.863+01:00 level=INFO source=amd_linux.go:402 msg="no compatible amdgpu devices detected"
time=2025-05-07T13:20:15.863+01:00 level=INFO source=gpu.go:377 msg="no compatible GPUs were discovered"
time=2025-05-07T13:20:15.863+01:00 level=INFO source=types.go:130 msg="inference compute" id=0 library=cpu variant="" compute="" driver=0.0 name="" total="30.6 GiB" available="11.1 GiB"

This is based on Asus ROG Zephyrus G14 2024 model, and I could confirm nvidia-offload itself works with other solutions such as steam. It may be something I’m missing in the Nvidia setup somewhere, or Ollama config… The following is my NixOS config related to hardware.nvidia:

hardware.nvidia = {
  modesetting.enable = true;
  powerManagement.enable = false;
  powerManagement.finegrained = false;

  open = true;
  nvidiaSettings = true;

  prime = {
    offload = {
      enable = true;
      enableOffloadCmd = true;
    };
  };
};

I do not believe this has to do with the compatibility based on ollama/docs/gpu.md at 392de84031e71cbd97ffe19b89ccf6cfeed9c7b3 · ollama/ollama · GitHub. Is there something I’m missing here?

truh · May 7, 2025, 12:50pm

unsupported Radeon iGPU detected skipping

Looks to me like ollama doesn’t work on your integrated GPU.

rytswd · May 7, 2025, 12:59pm

Yes, the Radeon iGPU not working is expected. I’m trying to get RTX 4070 to work, but it’s not getting detected at all with nvidia-offload.

Postboote · May 7, 2025, 1:51pm

Can you post your ollama configuration

rytswd · May 7, 2025, 2:50pm

Thanks for asking, I think I found the culprit!

I only had pkgs.ollama installed, as the systemd setup won’t have nvidia-offload in place by default. Looking at Ollama - NixOS Wiki, though, I could see there was the acceleration setting, which I didn’t specify anywhere.

After running OLLAMA_DEBUG=1 nvidia-offload ollama serve, I could see the following in the debug log with pkgs.ollama:

time=2025-05-07T15:06:08.115+01:00 level=DEBUG source=gpu.go:98 msg="searching for GPU discovery libraries for NVIDIA"
time=2025-05-07T15:06:08.115+01:00 level=DEBUG source=gpu.go:501 msg="Searching for GPU library" name=libcuda.so*
time=2025-05-07T15:06:08.115+01:00 level=DEBUG source=gpu.go:525 msg="gpu library search" globs="[/nix/store/nrcs8aijwjwq450chf1qlm9xxcp8n0iw-ollama-0.6.5/lib/ollama/libcuda.so* /home/ryota/libcuda.so* /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
time=2025-05-07T15:06:08.116+01:00 level=DEBUG source=gpu.go:558 msg="discovered GPU libraries" paths=[]
time=2025-05-07T15:06:08.116+01:00 level=DEBUG source=gpu.go:501 msg="Searching for GPU library" name=libcudart.so*
time=2025-05-07T15:06:08.116+01:00 level=DEBUG source=gpu.go:525 msg="gpu library search" globs="[/nix/store/nrcs8aijwjwq450chf1qlm9xxcp8n0iw-ollama-0.6.5/lib/ollama/libcudart.so* /home/ryota/libcudart.so* /nix/store/nrcs8aijwjwq450chf1qlm9xxcp8n0iw-ollama-0.6.5/lib/ollama/cuda_v*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]"
time=2025-05-07T15:06:08.116+01:00 level=DEBUG source=gpu.go:558 msg="discovered GPU libraries" paths=[]

It turns out that I had to use pkgs.ollama-cuda instead.

time=2025-05-07T15:45:09.832+01:00 level=DEBUG source=gpu.go:98 msg="searching for GPU discovery libraries for NVIDIA"
time=2025-05-07T15:45:09.832+01:00 level=DEBUG source=gpu.go:501 msg="Searching for GPU library" name=libcuda.so*
time=2025-05-07T15:45:09.832+01:00 level=DEBUG source=gpu.go:525 msg="gpu library search" globs="[/nix/store/zb8k5pfi9fp2rbq7cjaqrzxk26asrkcf-ollama-0.6.5/lib/ollama/libcuda.so* /run/opengl-driver/lib/libcuda.so* /nix/store/7f9dcx21lc19yy7h29rlqgr0fcdzvr9m-cuda_cudart-12.8.90-lib/lib/libcuda.so* /nix/store/wks4anspd6ckr3fcd7wk2h1a28r9kdam-libcublas-12.8.4.1-lib/lib/libcuda.so* /nix/store/bl6y7mnhhd22vz0indq236jnc8nifvym-cuda_cccl-12.8.90/lib/libcuda.so* /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
time=2025-05-07T15:45:09.832+01:00 level=DEBUG source=gpu.go:558 msg="discovered GPU libraries" paths=[/nix/store/1hrq89r5fgw7h63479arcf9058agkwj4-nvidia-x11-570.133.07-6.14.3/lib/libcuda.so.570.133.07]
initializing /nix/store/1hrq89r5fgw7h63479arcf9058agkwj4-nvidia-x11-570.133.07-6.14.3/lib/libcuda.so.570.133.07
dlsym: cuInit - 0x7f8e6fd0fe70
dlsym: cuDriverGetVersion - 0x7f8e6fd0fe90
dlsym: cuDeviceGetCount - 0x7f8e6fd0fed0
dlsym: cuDeviceGet - 0x7f8e6fd0feb0
dlsym: cuDeviceGetAttribute - 0x7f8e6fd0ffb0
dlsym: cuDeviceGetUuid - 0x7f8e6fd0ff10
dlsym: cuDeviceGetName - 0x7f8e6fd0fef0
dlsym: cuCtxCreate_v3 - 0x7f8e6fd10190
dlsym: cuMemGetInfo_v2 - 0x7f8e6fd10910
dlsym: cuCtxDestroy - 0x7f8e6fd6eab0
calling cuInit
calling cuDriverGetVersion
raw version 0x2f30
CUDA driver version: 12.8
calling cuDeviceGetCount
device count 1
time=2025-05-07T15:45:09.960+01:00 level=DEBUG source=gpu.go:125 msg="detected GPUs" count=1 library=/nix/store/1hrq89r5fgw7h63479arcf9058agkwj4-nvidia-x11-570.133.07-6.14.3/lib/libcuda.so.570.133.07
[GPU-e36e16a1-bf4c-800a-4ed9-e8b1221a6cc6] CUDA totalMem 7816 mb
[GPU-e36e16a1-bf4c-800a-4ed9-e8b1221a6cc6] CUDA freeMem 7660 mb
[GPU-e36e16a1-bf4c-800a-4ed9-e8b1221a6cc6] Compute Capability 8.9

And with that, it successfully uses the GPU:

time=2025-05-07T15:45:10.171+01:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-e36e16a1-bf4c-800a-4ed9-e8b1221a6cc6 library=cuda variant=v12 compute=8.9 driver=12.8 name="NVIDIA GeForce RTX 4070 Laptop GPU" total="7.6 GiB" available="7.5 GiB"

I will be looking into how I can manage the systemd using nvidia-offload, but the main question is now resolved!

rytswd · May 9, 2025, 1:58am

Just to close this off, I added the following Home Manager configs in order to get the nvidia-offload behaviour with systemd. Not sure if this is the best approach, but it was easiest for me.

    services.ollama = {
      enable = true;
      acceleration = "cuda";
    };

    # Update the exec to effectively use nvidia-offload.
    systemd.user.services.ollama = {
      # NOTE: Due to the laptop sleep handling, I could see a random error of:
      #
      #     cuda driver library init failure: 999.
      #
      # In case of this error, I just need to run the following command:
      #
      #     sudo rmmod nvidia_uvm; sudo modprobe nvidia_uvm
      #
      # However, this assumes there is no other GPU usage, or if they are, they
      # could be safely killed with the above command.
      environment = {
        __NV_PRIME_RENDER_OFFLOAD = "1";
        __NV_PRIME_RENDER_OFFLOAD_PROVIDER = "NVIDIA-G0";
        __GLX_VENDOR_LIBRARY_NAME = "nvidia";
        __VK_LAYER_NV_optimus = "NVIDIA_only";
      };
    };