Immich and CUDA-accelerated machine learning

Hi,

I’m trying to set up CUDA-acceleration for the machine learning service of Immich.

This is running on a computer with an Nvidia T500, which should be supported by Immich.

GPU is set up like this (which is more or less copy-pasted from the wiki):

{ config, ... }:
{
  allowedUnfree = [ "nvidia-x11" "nvidia-persistenced" ];

  # Enable OpenGL
  hardware.graphics = {
    enable = true;
  };
  services.xserver.videoDrivers = [ "nvidia" ];

  hardware.nvidia = {
    nvidiaPersistenced = false;
    # Modesetting is required.
    modesetting.enable = true;

    # Nvidia power management. Experimental, and can cause sleep/suspend to fail.
    # Enable this if you have graphical corruption issues or application crashes after waking
    # up from sleep. This fixes it by saving the entire VRAM memory to /tmp/ instead 
    # of just the bare essentials.
    powerManagement.enable = false;

    # Fine-grained power management. Turns off GPU when not in use.
    # Experimental and only works on modern Nvidia GPUs (Turing or newer).
    powerManagement.finegrained = false;

    # Use the NVidia open source kernel module (not to be confused with the
    # independent third-party "nouveau" open source driver).
    # Support is limited to the Turing and later architectures. Full list of 
    # supported GPUs is at: 
    # https://github.com/NVIDIA/open-gpu-kernel-modules#compatible-gpus 
    # Only available from driver 515.43.04+
    # Currently alpha-quality/buggy, so false is currently the recommended setting.
    open = false;

    # Enable the Nvidia settings menu,
    # accessible via `nvidia-settings`.
    nvidiaSettings = false;

    # Optionally, you may need to select the appropriate driver version for your specific GPU.
    package = config.boot.kernelPackages.nvidiaPackages.stable;

    prime = {
      offload = {
        enable = true;
        enableOffloadCmd = true;
      };
      intelBusId = "PCI:0:2:0";
      nvidiaBusId = "PCI:1:0:0";
    };
  };

  hardware.nvidia-container-toolkit.enable = true;
}

Immich is configured like this:

  services.immich = {
    enable = true;
    openFirewall = true;
    host = "0.0.0.0";
}

My problem is that the GPU doesn’t seem to be recognised. I test this by uploading a new image to Immich and looking at nvtop. I expect the “smart search” machine learning job to cause some load on the GPU, but never see some.

Already tried to set
services.immich.machine-learning.environment.DEVICE to cuda or nvidia and users.users.immich.extraGroups = ["video" "render"];, but didn’t succeed.

What am I missing?

1 Like

Oh, it seems the package is not built with GPU support. Compare the dependencies in the package to the original project.toml.
Can someone confirm that my interpretation is correct?

I’ve gotten immich-machine-learning to work with CUDA. Here are the things I had to do. This is in addition to all the work to get nvidia drivers working. You probably want to verify that with a different app first. I used plex hardware transcoding to verify.

1.Enable cudaSupport for onnxruntime.

Ideally, I’d be able to do this via nixpkgs.config.cudaSupport but mxnet-1.9.1 with cudaSupport is marked broken. onnxruntime is a transitive dependency via at least insightface but probably also huggingface-hub so I had to use an overlay:

nixpkgs.overlays = [
  (final: prev: {
    onnxruntime = prev.onnxruntime.override {cudaSupport = true;};
  })
];

I read in CUDA - NixOS Wiki that adding the nix-community cache might prevent me from having to recompile things, but it didn’t (maybe because I’m not using the same nvidia driver version as others? IDK).

  1. Point LD_LIBRARY_PATH to onnxruntime.

It seems like this might be a bug in how onnxruntime is packaged. I might file one later.

services.immich.machine-learning = {
  environment.LD_LIBRARY_PATH = "${pkgs.python312Packages.onnxruntime}/lib    /python3.12/site-packages/onnxruntime/capi";
};
  1. Patch immich-machine-learning to disable a broken test.

This is Build failure: immich-machine-learning-1.118.2 · Issue #352113 · NixOS/nixpkgs · GitHub I’m not sure why it’s failing.

nixpkgs.overlays = [
  (final: prev: {
    # Work-around https://github.com/NixOS/nixpkgs/issues/352113
    immich-machine-learning = prev.immich-machine-learning.overrideAttrs (_: {patches = [./disable_cuda_test.diff];});
  })
];

disable_cuda_test.diff:

--- a/app/test_main.py	2025-02-11 21:09:09.022378668 -0800
+++ b/app/test_main.py	2025-02-11 21:09:18.327188276 -0800
@@ -241,8 +241,6 @@
         session = OrtSession("ViT-B-32__openai")
 
         assert session.sess_options.execution_mode == ort.ExecutionMode.ORT_SEQUENTIAL
-        assert session.sess_options.inter_op_num_threads == 1
-        assert session.sess_options.intra_op_num_threads == 2
         assert session.sess_options.enable_cpu_mem_arena is False
 
     def test_sets_default_sess_options_does_not_set_threads_if_non_cpu_and_default_threads(self) -> None:

Signs that everything is working:

  • This error message is no longer emitted on startup of immich-machine-learning:
    2022-10-28 19:54:16.5781916 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:1622 onnxruntime::python::CreateInferencePybindStateModule] Init provider bridge failed.
    
  • Clicking “refresh faces” causes a process to show up on nvtop ~immediately for me.
2 Likes

you are a real hero, and by the way, I had to put the system env this way on my machine to get it working (I am using immich-machine-learning alone):

  systemd.services.immich-machine-learning = {
    description = "Immich Machine Learning Service";
    after = ["network.target"];
    wantedBy = ["multi-user.target"];

    environment.LD_LIBRARY_PATH = "${pkgs.python312Packages.onnxruntime}/lib:${pkgs.python312Packages.onnxruntime}/lib/python3.12/site-packages/onnxruntime/capi";

    serviceConfig = {
      ExecStart = "${pkgs.immich-machine-learning}/bin/machine-learning";
      User = "immich-ml";
    };
  };

The above almost worked for me. I had to add this to /etc/nixos/configuration.nix to get it working:

  systemd.services.immich-machine-learning = {
    serviceConfig = {
      PrivateDevices = lib.mkForce false;
      DeviceAllow = [
        "/dev/nvidia0"
        "/dev/nvidiactl"
        "/dev/nvidia-uvm"
      ];
    };
  };

There’s a PR that’s been merged and will allow configuring this without the override:

When that’s available, if you’re using an NVIDIA card, setting services.immich.accelerationDevices to the above DeviceAllow list should enable GPU acceleration. Otherwise you’ll get an error like this:

[E:onnxruntime:Default, cuda_call.cc:118 CudaCall] CUDA failure 100: no CUDA-capable device is detected ; GPU=0 ; hostname=foo ; file=/build/source/onnxruntime/core/providers/cuda/cuda_execution_provider_info.cc ; line=65 ; expr=cudaGetDeviceCount(&num_devices);

I updated my machine recently and had to adjust the patch. It now looks like this:

--- a/test_main.py      2025-02-11 21:09:09.022378668 -0800
+++ b/test_main.py      2025-02-11 21:09:18.327188276 -0800
@@ -285,8 +285,6 @@
         session = OrtSession("ViT-B-32__openai")

         assert session.sess_options.execution_mode == ort.ExecutionMode.ORT_SEQUENTIAL
-        assert session.sess_options.inter_op_num_threads == 1
-        assert session.sess_options.intra_op_num_threads == 2
         assert session.sess_options.enable_cpu_mem_arena is False

     def test_sets_default_sess_options_does_not_set_threads_if_non_cpu_and_default_threads(self) -> None: