Ollama: cuda driver library init failure: 3

little-dude · March 2, 2025, 4:50pm

Hello,

I’d like to run some models locally. I basically added this to my config:

services.ollama = {
  enable = true;
  acceleration = "cuda";
};

Unfortunately it seems that ollama is unable to use the GPU:

● ollama.service - Server for local large language models
     Loaded: loaded (/etc/systemd/system/ollama.service; enabled; preset: ignored)
     Active: active (running) since Sun 2025-03-02 17:08:54 CET; 35min ago
 Invocation: d04373ecc61c4f779f12187424d2f81b
   Main PID: 2259 (.ollama-wrapped)
         IP: 0B in, 0B out
         IO: 53.3M read, 0B written
      Tasks: 16 (limit: 76896)
     Memory: 37M (peak: 39.2M)
        CPU: 139ms
     CGroup: /system.slice/ollama.service
             └─2259 /nix/store/rw5dmn8jflvh8sh3jjv2rr5f11ga2sb0-ollama-0.5.12/bin/ollama serve

Mar 02 17:08:54 system76-laptop systemd[1]: Starting Server for local large language models...
Mar 02 17:08:54 system76-laptop systemd[1]: Started Server for local large language models.
Mar 02 17:08:54 system76-laptop ollama[2259]: 2025/03/02 17:08:54 routes.go:1205: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/var/lib/ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
Mar 02 17:08:54 system76-laptop ollama[2259]: time=2025-03-02T17:08:54.089+01:00 level=INFO source=images.go:432 msg="total blobs: 5"
Mar 02 17:08:54 system76-laptop ollama[2259]: time=2025-03-02T17:08:54.089+01:00 level=INFO source=images.go:439 msg="total unused blobs removed: 0"
Mar 02 17:08:54 system76-laptop ollama[2259]: time=2025-03-02T17:08:54.089+01:00 level=INFO source=routes.go:1256 msg="Listening on 127.0.0.1:11434 (version 0.5.12)"
Mar 02 17:08:54 system76-laptop ollama[2259]: time=2025-03-02T17:08:54.090+01:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
Mar 02 17:08:54 system76-laptop ollama[2259]: time=2025-03-02T17:08:54.112+01:00 level=INFO source=gpu.go:612 msg="Unable to load cudart library /nix/store/lgmvgx3r1pbpd40crz2nnliakfxh19f8-nvidia-x11-570.124.04-6.12.17/lib/libcuda.so.570.124.04: cuda driver library init failure: 3"
Mar 02 17:08:54 system76-laptop ollama[2259]: time=2025-03-02T17:08:54.122+01:00 level=INFO source=gpu.go:377 msg="no compatible GPUs were discovered"
Mar 02 17:08:54 system76-laptop ollama[2259]: time=2025-03-02T17:08:54.123+01:00 level=INFO source=types.go:130 msg="inference compute" id=0 library=cpu variant="" compute="" driver=0.0 name="" total="62.6 GiB" available="61.3 GiB"

The source of the issue seems to be this bit:

Unable to load cudart library /nix/store/lgmvgx3r1pbpd40crz2nnliakfxh19f8-nvidia-x11-570.124.04-6.12.17/lib/libcuda.so.570.124.04: cuda driver library init failure: 3

But I’m not sure what’s wrong exactly. My GPU is working correctly:

❯ nvidia-smi                                                                               
Sun Mar  2 17:48:34 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.124.04             Driver Version: 570.124.04     CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4060 ...    Off |   00000000:01:00.0  On |                  N/A |
| N/A   47C    P8              3W /  115W |      61MiB /   8188MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            3550      G   ...me-shell-47.4/bin/gnome-shell         42MiB |
+-----------------------------------------------------------------------------------------+

My full nvidia config doesn’t have anything special: nix-config/nixos/nvidia.nix at 95f957ca2961a4cb484ced6616dd640615b9f9a1 · little-dude/nix-config · GitHub

How could I debug this further? I’m absolutely not familiar with CUDA or GPUs in general, I mostly blindly follow the doc when it comes to this.

little-dude · March 3, 2025, 9:46am

I think it might be because of a version mismatch?

I have CUDA 12.4:

❯ fd libcudart /nix/store 
/nix/store/gmi2psrn7w2sf7fwzxmjk98wbz267hp7-cuda_cudart-12.4.127-lib/lib/libcudart.so
/nix/store/gmi2psrn7w2sf7fwzxmjk98wbz267hp7-cuda_cudart-12.4.127-lib/lib/libcudart.so.12
/nix/store/gmi2psrn7w2sf7fwzxmjk98wbz267hp7-cuda_cudart-12.4.127-lib/lib/libcudart.so.12.4.127

But it looks like the nvidia driver works with cuda 12.8?

❯ nvidia-smi   
Mon Mar  3 10:44:09 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.124.04             Driver Version: 570.124.04     CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4060 ...    Off |   00000000:01:00.0  On |                  N/A |
| N/A   46C    P8              3W /  115W |      61MiB /   8188MiB |     28%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            3384      G   ...me-shell-47.4/bin/gnome-shell         42MiB |
+-----------------------------------------------------------------------------------------+

But how can I:

either make the nvidia driver work for the version of CUDA that nixos has
or install CUDA 12.8