Openai-whisper failing on larger models (NVIDIA / CUDA issues)

I’ve been battling with cuda on nixos for a few months now and almost have a working setup.

Courtesy of this flake:

{
    description = "Python 3.11 development environment";
    outputs = { self, nixpkgs }:
    let
      system = "x86_64-linux";
      pkgs = import nixpkgs {
        inherit system;
        config.allowUnfree = true;
      };
    in {
      devShells.${system}.default = (pkgs.buildFHSEnv {
        name = "nvidia-fuck-you";
        targetPkgs = pkgs: (with pkgs; [
          linuxPackages.nvidia_x11
          libGLU libGL
          xorg.libXi xorg.libXmu freeglut
          xorg.libXext xorg.libX11 xorg.libXv xorg.libXrandr zlib
          ncurses5 stdenv.cc binutils
          ffmpeg

          # I daily drive the fish shell
          # you can remove this, the default is bash
         zsh

          # Micromamba does the real legwork
          micromamba
        ]);

        profile = ''
            export LD_LIBRARY_PATH="${pkgs.linuxPackages.nvidia_x11}/lib"
            export CUDA_PATH="${pkgs.cudatoolkit}"
            export EXTRA_LDFLAGS="-L/lib -L${pkgs.linuxPackages.nvidia_x11}/lib"
            export EXTRA_CCFLAGS="-I/usr/include"

            "/home/safri/.nix/cuda/env.sh"

            # Initialize micromamba shell
            eval "$(micromamba shell hook --shell zsh)"

            # Activate micromamba environment for PyTorch with CUDA
            micromamba activate pytorch-cuda
        '';

        # again, you can remove this if you like bash
        runScript = "zsh";
      }).env;
    };
}

and a micromamba environment:

micromamba env create \
    -n pytorch-cuda \
    anaconda::cudatoolkit \
    anaconda::cudnn \
    "anaconda::pytorch=*=*cuda*"

I now have my gpu actually working with openai-whisper. The only problem is that whisper now fails to work with any of the larger models (despite it working just fine when I was on Arch linux with the same computer), it simply exits with:

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 254.00 MiB. GPU 


The gpu also doesn’t seem to be working when whisper isn’t running even though I have set:

services.xserver.videoDrivers = [ "nvidia" ]; 

My NVIDIA config is:

services.xserver.videoDrivers = [ "nvidia" ];

  hardware= {

    opengl.enable = true;

    nvidia = {
    open = false;
    powerManagement.enable = false;
    nvidiaSettings = true;
    modesetting.enable = false;
    package = config.boot.kernelPackages.nvidiaPackages.stable;
    prime = {
      offload.enable = false;
      nvidiaBusId = "PCI:1:0:0";
      intelBusId = "PCI:0:2:0";
    };
  };
  };

Just in case it’s relevant:

  boot.initrd.availableKernelModules = [
    "xhci_pci"
    "thunderbolt"
    "nvme"
    "usbhid"
    "usb_storage"
    "sd_mod"
  ];
  boot.initrd.kernelModules = [ ];
  boot.kernelModules = [ "kvm-intel" ];
  boot.extraModulePackages = [ ];

Is in my hardware-configuration.nix

While in the venv, its clear that the cuda is working:

>>> import torch
>>> print(torch.cuda.is_available())
True

It’s using cuda version 11.8

My main system is running a wayland compositor (hyprland) on NixOS 24.05

Can somebody please tell me why whisper is failing with any model above medium despite my gpu being enough to run the higher models?

Any help would be appreciated.

You are not using CUDA 11.8, according to nvidia-smi you are running on CUDA v12.4, and its also presenting some bad data in report. Notably, 588w/35w usage. You need to go over and redo your driver packages, also note too that certain cuda packs will only work with certain driver packs. You need to ensure uniformity between cuda’s system install and the version openai is using.
I ran into a similar issue with this, and found that driver v535 worked better for AI tasks.

Heres a snip from my config that might help.

hardware.nvidia.modesetting.enable = true;
hardware.nvidia.powerManagement.enable = false;
hardware.nvidia.powerManagement.finegrained = false;
hardware.nvidia.open = false;
hardware.nvidia.nvidiaSettings = true;
hardware.nvidia.package = let
rcu_patch = pkgs.fetchpatch {
url = “https://github.com/gentoo/gentoo/raw/c64caf53/x11-drivers/nvidia-drivers/files/nvidia-drivers-470.223.02-gpl-pfn_valid.patch”;
hash = “sha256-eZiQQp2S/asE7MfGvfe6dA/kdCvek9SYa/FFGp24dVg=”;
};
in config.boot.kernelPackages.nvidiaPackages.mkDriver {
#version = “545.29.06”;
#sha256_64bit = “sha256-grxVZ2rdQ0FsFG5wxiTI3GrxbMBMcjhoDFajDgBFsXs=”;
#sha256_aarch64 = “sha256-o6ZSjM4gHcotFe+nhFTePPlXm0+RFf64dSIDt+RmeeQ=”;
#openSha256 = “sha256-h4CxaU7EYvBYVbbdjiixBhKf096LyatU6/V6CeY9NKE=”;
#settingsSha256 = “sha256-YBaKpRQWSdXG8Usev8s3GYHCPqL8PpJeF6gpa2droWY=”;
#persistencedSha256 = “sha256-AiYrrOgMagIixu3Ss2rePdoL24CKORFvzgZY3jlNbwM=”;

#version = "535.154.05";
#sha256_64bit = "sha256-fpUGXKprgt6SYRDxSCemGXLrEsIA6GOinp+0eGbqqJg=";
#sha256_aarch64 = "sha256-G0/GiObf/BZMkzzET8HQjdIcvCSqB1uhsinro2HLK9k=";
#openSha256 = "sha256-wvRdHguGLxS0mR06P5Qi++pDJBCF8pJ8hr4T8O6TJIo=";
#settingsSha256 = "sha256-9wqoDEWY4I7weWW05F4igj1Gj9wjHsREFMztfEmqm10=";
#persistencedSha256 = "sha256-d0Q3Lk80JqkS1B54Mahu2yY/WocOqFFbZVBh+ToGhaE=";

version = "550.40.07";
sha256_64bit = "sha256-KYk2xye37v7ZW7h+uNJM/u8fNf7KyGTZjiaU03dJpK0=";
sha256_aarch64 = "sha256-AV7KgRXYaQGBFl7zuRcfnTGr8rS5n13nGUIe3mJTXb4=";
openSha256 = "sha256-mRUTEWVsbjq+psVe+kAT6MjyZuLkG2yRDxCMvDJRL1I=";
settingsSha256 = "sha256-c30AQa4g4a1EHmaEu1yc05oqY01y+IusbBuq+P6rMCs=";
persistencedSha256 = "sha256-11tLSY8uUIl4X/roNnxf5yS2PQvHvoNjnd2CB67e870=";

patches = [ rcu_patch ];

};

I am also using ollama and serve not openai, but this might provide some context.

###################################################
###                                             ###
###            AI and LLM                       ###
###                                             ###
###################################################
#services.ollama.sandbox.enable = true;
services.ollama.acceleration = "cuda";
services.ollama.package = pkgs.ollama;
services.ollama.enable = true;
services.ollama.environmentVariables = { OLLAMA_LLM_LIBRARY = "cuda";};

environment.systemPackages = [

  pkgs.cudaPackages.cudatoolkit
  pkgs.cudaPackages.cuda_opencl
];

To be quiet honest I’m not even sure how cuda is installed outside of the virtual environment

In the venv:

>>> import torch

>>> print(torch.version.cuda)
11.8

Either way,
Nixos is failing to build at “url”, returning:


       error: syntax error, unexpected invalid token

       at /nix/store/19sd93l3mdg93910ygpz41n6nqnqv1ha-source/modules/nvidia.nix:32:7:

           31| rcu_patch = pkgs.fetchpatch {
           32| url = “https://github.com/gentoo/gentoo/raw/c64caf53/x11-drivers/nvidia-drivers/files/nvidia-drivers-470.223.02-gpl-pfn_valid.patch”;
             |       ^
           33| hash = “sha256-eZiQQp2S/asE7MfGvfe6dA/kdCvek9SYa/FFGp24dVg=”;
[main e0aeb29] update

How exactly do I specify the driver version I want?

You mentioned VM, how exactly are you trying to run this? If its in a VM that would probably change the approach to the problem.

Post your config and what not, if your willing.

Your PRIME config seems incomplete to me, as you need to activate one of the PRIME modes as well (offload, sync, reverseSync).

Also, you have to enable modesetting for the driver to work correctly.

See Running Specific NVIDIA Driver Versions - NixOS Wiki

P.S: NixOS 24.05 reaches EOL today, so you might want to switch to 24.11 soon for better support and newer packages.

Sorry for the late response,

It’s running in micromamba env not a VM.

However, I think it might be easiest if I try to install pytorch and cuda via nixpkgs on the main system again. Everytime I try though, my laptop freeezes and the rebuild stops (Usually gets stuck building magma and triton). I’ve tried limiting the cores to 15 and then to 8 (My i9 has 20) but I come back the next morning and the rebuild magically stopped working.

All of my dotfiles can be found here

./configuration.nix
./modules/cuda.nix
./modules/graphics.nix

are probably the files most relevant to this issue.

I now have prime setup with:

sync.enable = true;

However tried modesetting both enabled and disabled, the problem was still there either way

I’m now on NixOS 24.11, this caused the package:

package = config.boot.kernelPackages.nvidiaPackages.stable;

to interfere with the drivers in the flake that was mentioned at the beginning of the post. However, I think I would just like to install pytorch and cuda to the main system configuration. The only problem is that everytime I try to install pytorch-bin on my main setup, the computer freezes. I’ve tried limiting the cores nixos-rebuild is allowed to use but it doesn’t really seem to make a difference, I come back the next morning and the rebuild has stopped. Using cache didn’t seem to make any difference either. Do you have any idea how I should go about installing pytorch-bin without the building proccess crashing my computer?

I assume it’s the same with prime.offload?

The nix-community cache has recently begun including CUDA packages and it’s quite useful to have in general, so I suggest you enable it and see:

nixpkgs.config.cudaSupport = true;

nix.settings = {
  substituters = [
    "https://nix-community.cachix.org"
  ];
  trusted-public-keys = [
    # Compare to the key published at https://nix-community.org/cache
    "nix-community.cachix.org-1:mB9FSh9qf2dCimDSUo8Zy7bkq5CX+/rkCWyvRCYg3Fs="
  ];
};

PS: If you don’t need packages for a specific python version, it’s probably better to use python3Packages instead of python3xxPackages since the former will just point to the current python3 version that’s installed in your system.

environment.systemPackages = with pkgs; [
  python3Packages.pytorch-bin
  python3Packages.openai-whisper
];

This means you don’t need to change your config every time python’s version changes and you don’t have to install 2 python versions (one from the system and the other from python3xxPackages).

That worked great! Nix built in just a few minutes with this config:

nixpkgs.config.cudaSupport = true;

nix.settings = {
  substituters = [ "https://nix-community.cachix.org"];
  trusted-public-keys = [
    # Compare to the key published at https://nix-community.org/cache
    "nix-community.cachix.org-1:mB9FSh9qf2dCimDSUo8Zy7bkq5CX+/rkCWyvRCYg3Fs="
  ];
};

  environment.systemPackages = with pkgs; [
    python3Packages.pytorch-bin
    python3Packages.openai-whisper
  ];

However, I can’t seem to access the pytorch installation:

nixos% python -c "import torch; print(torch.cuda.is_available())"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'torch'

If you want a python interpreter with these modules available you need the following expression:

(python3.withPackages (ps: with ps; [
  torch-bin
  openai-whisper
])

Note that pytorch-bin is an alias for torch-bin.

~/git/nixos/master $ rg pytorch-bin
pkgs/top-level/python-aliases.nix
562:  pytorch-bin = torch-bin; # added 2022-09-30
1 Like

Not really sure what happened but whisper appears to be using my gpu but now I’m having the same issue as before.

nixos% whisper "/home/safri/Media/VisualMedia/日本語のYOUTUBE/1年前に大流行したアパートが舞台のホラーゲーム『 例外配達 』 [cI0FRXX9W9w].mkv"
/nix/store/ifhcrz76lcbj8i64wya7qwlmimm7ig1z-python3.12-whisper-20240930/lib/python3.12/site-packages/whisper/__init__.py:150: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/bb5e439f2d8a46172b8b7d2fdb7609822b9a97b1/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  checkpoint = torch.load(fp, map_location=device)
Traceback (most recent call last):
  File "/nix/store/ifhcrz76lcbj8i64wya7qwlmimm7ig1z-python3.12-whisper-20240930/bin/.whisper-wrapped", line 9, in <module>
    sys.exit(cli())
             ^^^^^
  File "/nix/store/ifhcrz76lcbj8i64wya7qwlmimm7ig1z-python3.12-whisper-20240930/lib/python3.12/site-packages/whisper/transcribe.py", line 577, in cli
    model = load_model(model_name, device=device, download_root=model_dir)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nix/store/ifhcrz76lcbj8i64wya7qwlmimm7ig1z-python3.12-whisper-20240930/lib/python3.12/site-packages/whisper/__init__.py", line 160, in load_model
    return model.to(device)
           ^^^^^^^^^^^^^^^^
  File "/nix/store/pdzq2lbk0qdlk6k20j5ygsb5xvq5680m-python3.12-torch-2.5.1/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1340, in to
    return self._apply(convert)
           ^^^^^^^^^^^^^^^^^^^^
  File "/nix/store/pdzq2lbk0qdlk6k20j5ygsb5xvq5680m-python3.12-torch-2.5.1/lib/python3.12/site-packages/torch/nn/modules/module.py", line 900, in _apply
    module._apply(fn)
  File "/nix/store/pdzq2lbk0qdlk6k20j5ygsb5xvq5680m-python3.12-torch-2.5.1/lib/python3.12/site-packages/torch/nn/modules/module.py", line 900, in _apply
    module._apply(fn)
  File "/nix/store/pdzq2lbk0qdlk6k20j5ygsb5xvq5680m-python3.12-torch-2.5.1/lib/python3.12/site-packages/torch/nn/modules/module.py", line 900, in _apply
    module._apply(fn)
  [Previous line repeated 2 more times]
  File "/nix/store/pdzq2lbk0qdlk6k20j5ygsb5xvq5680m-python3.12-torch-2.5.1/lib/python3.12/site-packages/torch/nn/modules/module.py", line 927, in _apply
    param_applied = fn(param)
                    ^^^^^^^^^
  File "/nix/store/pdzq2lbk0qdlk6k20j5ygsb5xvq5680m-python3.12-torch-2.5.1/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1326, in convert
    return t.to(
           ^^^^^
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacity of 5.66 GiB of which 19.00 MiB is free. Including non-PyTorch memory, this process has 5.62 GiB memory in use. Of the allocated memory 5.53 GiB is allocated by PyTorch, and 2.46 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Running the other models works fine but why would I be able to run the larger models on other distros but not on nixos?

“Whisper” is set as an alias for:

whisper --language Japanese --verbose False --output_format srt --threads 10 --model large-v2 --device cuda:0

This is the exact same command (I added the :0 at the end of cuda though) I used on Arch Linux a while back

I believe your main issue here is with the driver configuration. If you search no running processes found in this forum you’ll find many posts, each proposing a different solution. What I’d suggest is that you try them one by one and see if it makes a difference (don’t forget to reboot when you do). That said, you might wanna start by verifying that your busID config is correct.

I don’t know why I didn’t the same thing, it error me:
error: triton-2.1.0 not supported for interpreter python3.12
what’s your default python3 version.
I set up nix-community.cachix.org according to CUDA Cache for Nix Community , and use
python311
python311Packages.torch-bin
python311Packages.openai-whisper
and cuda still compile for hours.

I forgot to mention this, but the cache only takes effect after you switch to your new configuration, so you’ll need to:

  • Enable the cache
  • Disable CUDA and the packages that require CUDA
  • nixos-rebuild test or nixos-rebuild switch
  • Enable CUDA
  • nixos-rebuild switch

The issue is that you’re using an old nixpkgs. In nixos-24.11, triton is at version 3.1.0 both for python 3.11 and 3.12. So you’ll either need to use the triton you have with python3.11 or upgrade your system.

1 Like

Thanks, It is working now, And for me
nix-channel --add https://channels.nixos.org/nixos-unstable nixos let the cache and python version both solved.

1 Like

@safri I’ve been mistaken, my apologies. You probably don’t have a driver issue as I also have an Nvidia GPU with 6GB of VRAM, tried to use openai-whisper with the large-v3 model and got the same error. So the issue is indeed with large models as you originally described.

I don’t know why this works on Arch but doesn’t on NixOS, but I found an alternative tool whisper-ctranslate2 that’s compatible with the original and which I didn’t have any problems using even with bigger models, so you might want to give that a try, instead.

# VRAM usage ~2-3GB
time whisper-ctranslate2 --model large-v3 --language Japanese --device cuda --word_timestamps True --max_words_per_line 10 --task translate track.opus
________________________________________________________
Executed in  171.52 secs    fish           external
   usr time  184.94 secs    4.13 millis  184.94 secs
   sys time    8.05 secs    4.13 millis    8.05 secs

Comparing similar-sized models, it also seems faster:

# VRAM usage ~2GB
time whisper --model small --language Japanese --device cuda --word_timestamps True --max_words_per_line 10 --task translate track.opus
________________________________________________________
Executed in  104.92 secs    fish           external
   usr time  108.07 secs    1.19 millis  108.07 secs
   sys time    4.58 secs    0.32 millis    4.58 secs

# VRAM usage ~600MB
time whisper-ctranslate2 --model small --language Japanese --device cuda --word_timestamps True --max_words_per_line 10 --task translate track.opus
________________________________________________________
Executed in   34.40 secs    fish           external
   usr time   41.45 secs    1.17 millis   41.45 secs
   sys time    2.76 secs    0.47 millis    2.76 secs

I’d also like to note that there are options only available for this tool, like --batched which speeds up the processing considerably. Using it with large-v3 I’ve seen speeds close to small, but at the cost of slightly more VRAM usage:

# VRAM usage ~3-4GB
time whisper-ctranslate2 --batched True --model large-v3 --language Japanese --device cuda --word_timestamps True --max_words_per_line 10 --task translate track.opus
________________________________________________________
Executed in   41.67 secs    fish           external
   usr time   51.50 secs    6.49 millis   51.49 secs
   sys time    6.58 secs    4.43 millis    6.58 secs

You can find more details about this and other options in their homepage.

Thats exciting, the only problem now is that the cache doesn’t work anymore. Trying to build whisper-ctranslate2 results in a frozen computer within a few minutes.

Is there any way to limit the system resources nixos-rebuild uses so that it doesn’t freeze on itself?

I have 16gb of ram and a 13th gen i9, I suspect that this is a ram issue.

Do you know specifically which package is un-cached? You can use a very useful tool called nix-output-monitor to better visualize the build tree.

# with bash
$ nixos-rebuild build |& nom

# with fish
$ nixos-rebuild build &| nom

PS: I personally use nh, which is a replacement for nixos-rebuild and uses nom by default.

You can check out Tuning Cores and Jobs - Nix Reference Manual which can help with CPU-intensive builds.

With a Ryzen 7 6800H + 16GB RAM, I use:

# Limit build resources
nix.settings = {
  max-jobs = 4;
  cores = 12;
};

# NOTE: run `sudo systemctl restart nix-daemon.service` the first time you apply this.
systemd.services.nix-daemon.serviceConfig = {
  # kill Nix builds instead of important services when OOM
  OOMScoreAdjust = 1000;
};

Although my PC still freezes sometimes, it’s been manageable as the build is always killed after a while.

Nope, still freezing. When I installed “python3Packages.openai-whisper” with the nixos cache, it all finished in maybe 5 minutes. Now I’m trying to install the “whisper-ctranslate2” package but I’m getitng the same issue as I had before I set the cache. My system freezes maybe 10m into building and I need to force shut it down.