Ollama SIGSEGV with missing CUDA kernel image

I’m on nixos 25.05 (stable) on a System76 Oryx (oryp4) that has an onboard nvidia GTX1070 and have set

  services.ollama = {
    enable = true;
    loadModels = [ "qwen2.5-coder:14b"  ];
    acceleration = "cuda";
  };

in my configration.nix.
Ollama starts up fine…

Aug 19 07:09:02 quadrireme ollama[569414]: time=2025-08-19T07:09:02.484-04:00 level=INFO source=runner.go:815 msg="starting go runner"
Aug 19 07:09:02 quadrireme ollama[569414]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
Aug 19 07:09:02 quadrireme ollama[569414]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
Aug 19 07:09:02 quadrireme ollama[569414]: ggml_cuda_init: found 1 CUDA devices:
Aug 19 07:09:02 quadrireme ollama[569414]:   Device 0: NVIDIA GeForce GTX 1070 with Max-Q Design, compute capability 6.1, VMM: yes
Aug 19 07:09:02 quadrireme ollama[569414]: load_backend: loaded CUDA backend from /nix/store/b7r9lvm65c52h3pvjhgq514knqc4a969-ollama-0.11.4/lib/ollama/libggml-cuda.so
Aug 19 07:09:02 quadrireme ollama[569414]: load_backend: loaded CPU backend from /nix/store/b7r9lvm65c52h3pvjhgq514knqc4a969-ollama-0.11.4/lib/ollama/libggml-cpu-haswell.so
Aug 19 07:09:02 quadrireme ollama[569414]: time=2025-08-19T07:09:02.547-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CP>
Aug 19 07:09:02 quadrireme ollama[569414]: time=2025-08-19T07:09:02.547-04:00 level=INFO source=runner.go:874 msg="Server listening on 127.0.0.1:33523"
Aug 19 07:09:02 quadrireme ollama[569414]: llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce GTX 1070 with Max-Q Design) - 6642 MiB free

but then when queried drops a segfault:

Aug 19 07:09:03 quadrireme ollama[569414]: llama_context:      CUDA0 compute buffer size =  1234.89 MiB
Aug 19 07:09:03 quadrireme ollama[569414]: llama_context:  CUDA_Host compute buffer size =    47.01 MiB
Aug 19 07:09:03 quadrireme ollama[569414]: llama_context: graph nodes  = 1042
Aug 19 07:09:03 quadrireme ollama[569414]: llama_context: graph splits = 18 (with bs=512), 3 (with bs=1)
Aug 19 07:09:03 quadrireme ollama[569414]: time=2025-08-19T07:09:03.478-04:00 level=INFO source=server.go:637 msg="llama runner started in 1.00 seconds"
Aug 19 07:09:04 quadrireme ollama[569414]: ggml_cuda_compute_forward: RMS_NORM failed
Aug 19 07:09:04 quadrireme ollama[569414]: CUDA error: no kernel image is available for execution on the device
Aug 19 07:09:04 quadrireme ollama[569414]:   current device: 0, in function ggml_cuda_compute_forward at /build/source/ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:2377
Aug 19 07:09:04 quadrireme ollama[569414]:   err
Aug 19 07:09:04 quadrireme ollama[569414]: /build/source/ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:77: CUDA error
Aug 19 07:09:04 quadrireme ollama[569414]: SIGSEGV: segmentation violation
Aug 19 07:09:04 quadrireme ollama[569414]: PC=0x7f8c574a4287 m=11 sigcode=1 addr=0x206003fd8
Aug 19 07:09:04 quadrireme ollama[569414]: signal arrived during cgo execution

…help! I’m not even really sure how to begin debugging this, but in case it helps, the full log is at https://place.org/~pj/ollama_trace.txt

Should I try using an older version? or one from unstable? I’m new enough to nixos to not know how to do either, honestly.

Here’s hoping someone has more clues than I do.

–pj

Can you check dmesg if it triggers any audit statements? If so, that would mean our sandboxing is likely too strict.

There’s nothing in dmesg, sorry.

No kernel seems like the culprit.

…so what’s the solution? I thought there’d be a kernel for a GTX 1070 included, as it’s not even close to a new GPU. Where do I get or build one? Is there some config option I’m missing that would cause one to be built?

is it segfaulting because there’s no kernel, or is there no kernel because it’s segfaulting?

Is there logging or something I can turn on during the build to see if maybe the kernel isn’t being built?

A more recent trace:

Sep 05 20:32:12 quadrireme ollama[1059412]: ggml_cuda_compute_forward: RMS_NORM failed
Sep 05 20:32:12 quadrireme ollama[1059412]: CUDA error: no kernel image is available for execution on the device
Sep 05 20:32:12 quadrireme ollama[1059412]:   current device: 0, in function ggml_cuda_compute_forward at /build/source/ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:2377
Sep 05 20:32:12 quadrireme ollama[1059412]:   err
Sep 05 20:32:12 quadrireme ollama[1059412]: /build/source/ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:77: CUDA error
Sep 05 20:32:12 quadrireme ollama[1059412]: SIGSEGV: segmentation violation
Sep 05 20:32:12 quadrireme ollama[1059412]: PC=0x7f71bb8a4287 m=0 sigcode=1 addr=0x206003fe0
Sep 05 20:32:12 quadrireme ollama[1059412]: signal arrived during cgo execution

So IMO the ‘missing kernel’ message is a symptom, not a cause. The cause is the SIGSEGV somewhere in the go code, I think.

Apparently my initial google-fu was lacking. This is a known bug: ollama-cuda: works with only subset of GPU CUDA architectures supported by ollama, not documented anywhere · Issue #421775 · NixOS/nixpkgs · GitHub . Workaround is to add

 nixpkgs.config.packageOverrides = pkgs: {
    ollama = pkgs.ollama.override {
      cudaArches = [ "61" ];
    };
  };

to your configuration.nix so it builds in support for your cuda architecture.