Ollama SIGSEGV with missing CUDA kernel image

pjz · August 19, 2025, 11:21am

I’m on nixos 25.05 (stable) on a System76 Oryx (oryp4) that has an onboard nvidia GTX1070 and have set

  services.ollama = {
    enable = true;
    loadModels = [ "qwen2.5-coder:14b"  ];
    acceleration = "cuda";
  };

in my configration.nix.
Ollama starts up fine…

Aug 19 07:09:02 quadrireme ollama[569414]: time=2025-08-19T07:09:02.484-04:00 level=INFO source=runner.go:815 msg="starting go runner"
Aug 19 07:09:02 quadrireme ollama[569414]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
Aug 19 07:09:02 quadrireme ollama[569414]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
Aug 19 07:09:02 quadrireme ollama[569414]: ggml_cuda_init: found 1 CUDA devices:
Aug 19 07:09:02 quadrireme ollama[569414]:   Device 0: NVIDIA GeForce GTX 1070 with Max-Q Design, compute capability 6.1, VMM: yes
Aug 19 07:09:02 quadrireme ollama[569414]: load_backend: loaded CUDA backend from /nix/store/b7r9lvm65c52h3pvjhgq514knqc4a969-ollama-0.11.4/lib/ollama/libggml-cuda.so
Aug 19 07:09:02 quadrireme ollama[569414]: load_backend: loaded CPU backend from /nix/store/b7r9lvm65c52h3pvjhgq514knqc4a969-ollama-0.11.4/lib/ollama/libggml-cpu-haswell.so
Aug 19 07:09:02 quadrireme ollama[569414]: time=2025-08-19T07:09:02.547-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CP>
Aug 19 07:09:02 quadrireme ollama[569414]: time=2025-08-19T07:09:02.547-04:00 level=INFO source=runner.go:874 msg="Server listening on 127.0.0.1:33523"
Aug 19 07:09:02 quadrireme ollama[569414]: llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce GTX 1070 with Max-Q Design) - 6642 MiB free

but then when queried drops a segfault:

Aug 19 07:09:03 quadrireme ollama[569414]: llama_context:      CUDA0 compute buffer size =  1234.89 MiB
Aug 19 07:09:03 quadrireme ollama[569414]: llama_context:  CUDA_Host compute buffer size =    47.01 MiB
Aug 19 07:09:03 quadrireme ollama[569414]: llama_context: graph nodes  = 1042
Aug 19 07:09:03 quadrireme ollama[569414]: llama_context: graph splits = 18 (with bs=512), 3 (with bs=1)
Aug 19 07:09:03 quadrireme ollama[569414]: time=2025-08-19T07:09:03.478-04:00 level=INFO source=server.go:637 msg="llama runner started in 1.00 seconds"
Aug 19 07:09:04 quadrireme ollama[569414]: ggml_cuda_compute_forward: RMS_NORM failed
Aug 19 07:09:04 quadrireme ollama[569414]: CUDA error: no kernel image is available for execution on the device
Aug 19 07:09:04 quadrireme ollama[569414]:   current device: 0, in function ggml_cuda_compute_forward at /build/source/ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:2377
Aug 19 07:09:04 quadrireme ollama[569414]:   err
Aug 19 07:09:04 quadrireme ollama[569414]: /build/source/ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:77: CUDA error
Aug 19 07:09:04 quadrireme ollama[569414]: SIGSEGV: segmentation violation
Aug 19 07:09:04 quadrireme ollama[569414]: PC=0x7f8c574a4287 m=11 sigcode=1 addr=0x206003fd8
Aug 19 07:09:04 quadrireme ollama[569414]: signal arrived during cgo execution

…help! I’m not even really sure how to begin debugging this, but in case it helps, the full log is at https://place.org/~pj/ollama_trace.txt

Should I try using an older version? or one from unstable? I’m new enough to nixos to not know how to do either, honestly.

Here’s hoping someone has more clues than I do.

–pj

hexa · August 19, 2025, 11:36am

Can you check dmesg if it triggers any audit statements? If so, that would mean our sandboxing is likely too strict.

pjz · August 19, 2025, 1:20pm

There’s nothing in dmesg, sorry.

hexa · August 19, 2025, 6:46pm

No kernel seems like the culprit.

pjz · August 20, 2025, 9:31pm

…so what’s the solution? I thought there’d be a kernel for a GTX 1070 included, as it’s not even close to a new GPU. Where do I get or build one? Is there some config option I’m missing that would cause one to be built?

pjz · August 29, 2025, 9:22pm

is it segfaulting because there’s no kernel, or is there no kernel because it’s segfaulting?

pjz · August 31, 2025, 1:07am

Is there logging or something I can turn on during the build to see if maybe the kernel isn’t being built?

pjz · September 6, 2025, 12:35am

A more recent trace:

Sep 05 20:32:12 quadrireme ollama[1059412]: ggml_cuda_compute_forward: RMS_NORM failed
Sep 05 20:32:12 quadrireme ollama[1059412]: CUDA error: no kernel image is available for execution on the device
Sep 05 20:32:12 quadrireme ollama[1059412]:   current device: 0, in function ggml_cuda_compute_forward at /build/source/ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:2377
Sep 05 20:32:12 quadrireme ollama[1059412]:   err
Sep 05 20:32:12 quadrireme ollama[1059412]: /build/source/ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:77: CUDA error
Sep 05 20:32:12 quadrireme ollama[1059412]: SIGSEGV: segmentation violation
Sep 05 20:32:12 quadrireme ollama[1059412]: PC=0x7f71bb8a4287 m=0 sigcode=1 addr=0x206003fe0
Sep 05 20:32:12 quadrireme ollama[1059412]: signal arrived during cgo execution

So IMO the ‘missing kernel’ message is a symptom, not a cause. The cause is the SIGSEGV somewhere in the go code, I think.

pjz · September 6, 2025, 12:56am

Apparently my initial google-fu was lacking. This is a known bug: ollama-cuda: works with only subset of GPU CUDA architectures supported by ollama, not documented anywhere · Issue #421775 · NixOS/nixpkgs · GitHub . Workaround is to add

 nixpkgs.config.packageOverrides = pkgs: {
    ollama = pkgs.ollama.override {
      cudaArches = [ "61" ];
    };
  };

to your configuration.nix so it builds in support for your cuda architecture.