Hello,
I’d like to run some models locally. I basically added this to my config:
services.ollama = {
enable = true;
acceleration = "cuda";
};
Unfortunately it seems that ollama is unable to use the GPU:
● ollama.service - Server for local large language models
Loaded: loaded (/etc/systemd/system/ollama.service; enabled; preset: ignored)
Active: active (running) since Sun 2025-03-02 17:08:54 CET; 35min ago
Invocation: d04373ecc61c4f779f12187424d2f81b
Main PID: 2259 (.ollama-wrapped)
IP: 0B in, 0B out
IO: 53.3M read, 0B written
Tasks: 16 (limit: 76896)
Memory: 37M (peak: 39.2M)
CPU: 139ms
CGroup: /system.slice/ollama.service
└─2259 /nix/store/rw5dmn8jflvh8sh3jjv2rr5f11ga2sb0-ollama-0.5.12/bin/ollama serve
Mar 02 17:08:54 system76-laptop systemd[1]: Starting Server for local large language models...
Mar 02 17:08:54 system76-laptop systemd[1]: Started Server for local large language models.
Mar 02 17:08:54 system76-laptop ollama[2259]: 2025/03/02 17:08:54 routes.go:1205: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/var/lib/ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
Mar 02 17:08:54 system76-laptop ollama[2259]: time=2025-03-02T17:08:54.089+01:00 level=INFO source=images.go:432 msg="total blobs: 5"
Mar 02 17:08:54 system76-laptop ollama[2259]: time=2025-03-02T17:08:54.089+01:00 level=INFO source=images.go:439 msg="total unused blobs removed: 0"
Mar 02 17:08:54 system76-laptop ollama[2259]: time=2025-03-02T17:08:54.089+01:00 level=INFO source=routes.go:1256 msg="Listening on 127.0.0.1:11434 (version 0.5.12)"
Mar 02 17:08:54 system76-laptop ollama[2259]: time=2025-03-02T17:08:54.090+01:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
Mar 02 17:08:54 system76-laptop ollama[2259]: time=2025-03-02T17:08:54.112+01:00 level=INFO source=gpu.go:612 msg="Unable to load cudart library /nix/store/lgmvgx3r1pbpd40crz2nnliakfxh19f8-nvidia-x11-570.124.04-6.12.17/lib/libcuda.so.570.124.04: cuda driver library init failure: 3"
Mar 02 17:08:54 system76-laptop ollama[2259]: time=2025-03-02T17:08:54.122+01:00 level=INFO source=gpu.go:377 msg="no compatible GPUs were discovered"
Mar 02 17:08:54 system76-laptop ollama[2259]: time=2025-03-02T17:08:54.123+01:00 level=INFO source=types.go:130 msg="inference compute" id=0 library=cpu variant="" compute="" driver=0.0 name="" total="62.6 GiB" available="61.3 GiB"
The source of the issue seems to be this bit:
Unable to load cudart library /nix/store/lgmvgx3r1pbpd40crz2nnliakfxh19f8-nvidia-x11-570.124.04-6.12.17/lib/libcuda.so.570.124.04: cuda driver library init failure: 3
But I’m not sure what’s wrong exactly. My GPU is working correctly:
❯ nvidia-smi
Sun Mar 2 17:48:34 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.124.04 Driver Version: 570.124.04 CUDA Version: 12.8 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4060 ... Off | 00000000:01:00.0 On | N/A |
| N/A 47C P8 3W / 115W | 61MiB / 8188MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 3550 G ...me-shell-47.4/bin/gnome-shell 42MiB |
+-----------------------------------------------------------------------------------------+
My full nvidia config doesn’t have anything special: nix-config/nixos/nvidia.nix at 95f957ca2961a4cb484ced6616dd640615b9f9a1 · little-dude/nix-config · GitHub
How could I debug this further? I’m absolutely not familiar with CUDA or GPUs in general, I mostly blindly follow the doc when it comes to this.