I’m on nixos 25.05 (stable) on a System76 Oryx (oryp4) that has an onboard nvidia GTX1070 and have set
services.ollama = {
enable = true;
loadModels = [ "qwen2.5-coder:14b" ];
acceleration = "cuda";
};
in my configration.nix.
Ollama starts up fine…
Aug 19 07:09:02 quadrireme ollama[569414]: time=2025-08-19T07:09:02.484-04:00 level=INFO source=runner.go:815 msg="starting go runner"
Aug 19 07:09:02 quadrireme ollama[569414]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
Aug 19 07:09:02 quadrireme ollama[569414]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
Aug 19 07:09:02 quadrireme ollama[569414]: ggml_cuda_init: found 1 CUDA devices:
Aug 19 07:09:02 quadrireme ollama[569414]: Device 0: NVIDIA GeForce GTX 1070 with Max-Q Design, compute capability 6.1, VMM: yes
Aug 19 07:09:02 quadrireme ollama[569414]: load_backend: loaded CUDA backend from /nix/store/b7r9lvm65c52h3pvjhgq514knqc4a969-ollama-0.11.4/lib/ollama/libggml-cuda.so
Aug 19 07:09:02 quadrireme ollama[569414]: load_backend: loaded CPU backend from /nix/store/b7r9lvm65c52h3pvjhgq514knqc4a969-ollama-0.11.4/lib/ollama/libggml-cpu-haswell.so
Aug 19 07:09:02 quadrireme ollama[569414]: time=2025-08-19T07:09:02.547-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CP>
Aug 19 07:09:02 quadrireme ollama[569414]: time=2025-08-19T07:09:02.547-04:00 level=INFO source=runner.go:874 msg="Server listening on 127.0.0.1:33523"
Aug 19 07:09:02 quadrireme ollama[569414]: llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce GTX 1070 with Max-Q Design) - 6642 MiB free
but then when queried drops a segfault:
Aug 19 07:09:03 quadrireme ollama[569414]: llama_context: CUDA0 compute buffer size = 1234.89 MiB
Aug 19 07:09:03 quadrireme ollama[569414]: llama_context: CUDA_Host compute buffer size = 47.01 MiB
Aug 19 07:09:03 quadrireme ollama[569414]: llama_context: graph nodes = 1042
Aug 19 07:09:03 quadrireme ollama[569414]: llama_context: graph splits = 18 (with bs=512), 3 (with bs=1)
Aug 19 07:09:03 quadrireme ollama[569414]: time=2025-08-19T07:09:03.478-04:00 level=INFO source=server.go:637 msg="llama runner started in 1.00 seconds"
Aug 19 07:09:04 quadrireme ollama[569414]: ggml_cuda_compute_forward: RMS_NORM failed
Aug 19 07:09:04 quadrireme ollama[569414]: CUDA error: no kernel image is available for execution on the device
Aug 19 07:09:04 quadrireme ollama[569414]: current device: 0, in function ggml_cuda_compute_forward at /build/source/ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:2377
Aug 19 07:09:04 quadrireme ollama[569414]: err
Aug 19 07:09:04 quadrireme ollama[569414]: /build/source/ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:77: CUDA error
Aug 19 07:09:04 quadrireme ollama[569414]: SIGSEGV: segmentation violation
Aug 19 07:09:04 quadrireme ollama[569414]: PC=0x7f8c574a4287 m=11 sigcode=1 addr=0x206003fd8
Aug 19 07:09:04 quadrireme ollama[569414]: signal arrived during cgo execution
…help! I’m not even really sure how to begin debugging this, but in case it helps, the full log is at https://place.org/~pj/ollama_trace.txt
Should I try using an older version? or one from unstable? I’m new enough to nixos to not know how to do either, honestly.
Here’s hoping someone has more clues than I do.
–pj