Can't get the FFmpeg Whisper filter working (using Whisper.cpp to transcribe audio)

Late last year FFmpeg introduced a filter to transcribe audio using whisper.cpp: https://ffmpeg.org/ffplay-all.html#whisper-1. I tried this guide but have errors when I try the following command. I want it to run on my RTX 3090, using whisper-cli woks fine with CUDA.

ffmpeg -i https://github.com/vpalmisano/webrtcperf/releases/download/videos-1.0/gvr.mp4  -vn -af "whisper=model=ggml-large-v3.bin :language=en :queue=3 :destination=output.srt :format=srt" -f null -

My NixOS config:

  # Packages
  nixpkgs = {
    config = {
      cudaSupport = true;
      allowUnfree = true;
    };
  };

  environment.systemPackages = [
    (pkgs.ffmpeg-full.override {
      withUnfree = true;
    })
    pkgs.whisper-cpp-vulkan
];

  programs.nix-ld.libraries = [
    config.boot.kernelPackages.nvidia_x11
  ];

  hardware = {
    graphics.enable = true;
    nvidia = {
      nvidiaSettings = true;
      open = true;
    };
  };

Error Log:

ffmpeg version 8.0 Copyright (c) 2000-2025 the FFmpeg developers
  built with gcc 14.3.0 (GCC)
...
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'https://github.com/vpalmisano/webrtcperf/releases/download/videos-1.0/gvr.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf61.9.100
  Duration: 00:01:53.00, start: 0.000000, bitrate: 1201 kb/s
  Stream #0:0[0x1](und): Video: h264 (High) (avc1 / 0x31637661), yuv420p(tv, bt709, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 1122 kb/s, 30 fps, 30 tbr, 15360 tbn (default)
    Metadata:
      handler_name    : ISO Media file produced by Google Inc.
      vendor_id       : [0][0][0][0]
      encoder         : Lavc60.35.100 libx264
  Stream #0:1[0x2](eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, mono, fltp, 70 kb/s (default)
    Metadata:
      handler_name    : ISO Media file produced by Google Inc.
      vendor_id       : [0][0][0][0]
/build/source/ggml/src/ggml-backend.cpp:501: GGML_ASSERT(device) failed
/nix/store/r2x6hdy8y2b3czsg3993rjz3mh687dkc-whisper-cpp-1.8.2/lib/libggml-base.so(+0x143fd) [0x7fd6411ef3fd]
/nix/store/r2x6hdy8y2b3czsg3993rjz3mh687dkc-whisper-cpp-1.8.2/lib/libggml-base.so(ggml_print_backtrace+0x216) [0x7fd6411ef7b6]
/nix/store/r2x6hdy8y2b3czsg3993rjz3mh687dkc-whisper-cpp-1.8.2/lib/libggml-base.so(ggml_abort+0x144)[0x7fd6411ef974]
/nix/store/r2x6hdy8y2b3czsg3993rjz3mh687dkc-whisper-cpp-1.8.2/lib/libggml-base.so(+0x26507) [0x7fd641201507]
/nix/store/r2x6hdy8y2b3czsg3993rjz3mh687dkc-whisper-cpp-1.8.2/lib/libwhisper.so.1(+0x360f3) [0x7fd656da50f3]
/nix/store/r2x6hdy8y2b3czsg3993rjz3mh687dkc-whisper-cpp-1.8.2/lib/libwhisper.so.1(+0x38753) [0x7fd656da7753]
/nix/store/r2x6hdy8y2b3czsg3993rjz3mh687dkc-whisper-cpp-1.8.2/lib/libwhisper.so.1(whisper_init_with_params_no_state+0x29e) [0x7fd656da9e6e]
/nix/store/r2x6hdy8y2b3czsg3993rjz3mh687dkc-whisper-cpp-1.8.2/lib/libwhisper.so.1(whisper_init_from_file_with_params_no_state+0x1ab) [0x7fd656dadf2b]
/nix/store/r2x6hdy8y2b3czsg3993rjz3mh687dkc-whisper-cpp-1.8.2/lib/libwhisper.so.1(whisper_init_from_file_with_params+0x2b) [0x7fd656db176b]
/nix/store/hfwrpc7ycs53hdvfvgrvx8xa6rkpf725-ffmpeg-full-8.0-lib/lib/libavfilter.so.11(+0x17184b) [0x7fd65a77184b]
/nix/store/hfwrpc7ycs53hdvfvgrvx8xa6rkpf725-ffmpeg-full-8.0-lib/lib/libavfilter.so.11(avfilter_init_dict+0x71) [0x7fd65a797b51]
/nix/store/hfwrpc7ycs53hdvfvgrvx8xa6rkpf725-ffmpeg-full-8.0-lib/lib/libavfilter.so.11(avfilter_graph_segment_init+0x58) [0x7fd65a7c6308]
/nix/store/hfwrpc7ycs53hdvfvgrvx8xa6rkpf725-ffmpeg-full-8.0-lib/lib/libavfilter.so.11(avfilter_graph_segment_apply+0x44) [0x7fd65a7c6c34]
ffmpeg(+0x1fbda) [0x55d7a200fbda]
ffmpeg(+0x23545) [0x55d7a2013545]
ffmpeg(+0x257f1) [0x55d7a20157f1]
ffmpeg(+0x2c8a0) [0x55d7a201c8a0]
ffmpeg(+0x2d042) [0x55d7a201d042]
ffmpeg(+0x2d5c4) [0x55d7a201d5c4]
ffmpeg(+0x2e05a) [0x55d7a201e05a]
ffmpeg(+0x32599) [0x55d7a2022599]
ffmpeg(+0x356a3) [0x55d7a20256a3]
ffmpeg(main+0xa2) [0x55d7a2003142]
/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6(+0x2a4d8) [0x7fd65722a4d8]
/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6(__libc_start_main+0x8b) [0x7fd65722a59b]
ffmpeg(+0x13cc5) [0x55d7a2003cc5]
fish: Job 1, 'ffmpeg -i https://github.com/vp…' terminated by signal SIGABRT (Abort)

I introduced ffmeg in whisper-cpp a few weeks ago in nixpkgs. Basically i had to make whisper opt-in. You need to override the package and add withFFmpegSupport = true boolean flag.

Here is my working config for this:

let
  […]
  whisper-cpp-ffmpeg = pkgs.whisper-cpp.override {
    inherit cudaSupport rocmSupport cudaPackages rocmPackages;
    withFFmpegSupport = true;
  };
in
{
  environment.systemPackages = [
    whisper-cpp-ffmpeg
  ];
}

I you want to use llama-swap, you’ll need to wait a few days for llama-swap: fix whisper-cpp FFmpeg format conversions by gaelj · Pull Request #466621 · NixOS/nixpkgs · GitHub to be available.

EDIT: white space & code cleanup

Looking at nixpkgs/pkgs/by-name/wh/whisper-cpp/package.nix at fb7944c166a3b630f177938e478f0378e64ce108 · NixOS/nixpkgs · GitHub, isn’t cudaSupport and rocmSupport redundant if it alreadys reads config.cudaSupport & .rocmSupport?

Yes, pick your own preference. The important part is withFFmpegSupport = true

I still get the same error

  environment.systemPackages = [ 
   (pkgs.whisper-cpp.override {
      withFFmpegSupport = true;
    })
    (pkgs.ffmpeg-full.override {
      withUnfree = true;
    })
];

Apologies - I had skimmed and misunderstood the original post. My answer was about using whisper with ffmpegto convert unsupported input formats. Your question is about using ffmpeg with whisper to generate subtitles.

Sorry about that. I’ll see if I can find a fix for your actual problem now !

Oh no worries. I asked for help on the subreddit too, and someone linked a PR that’s still awaiting to be merged to make the whisper-cpp package available as a library for programs like FFmpeg: whisper-cpp: Ensure that backend dir is set by TimQuelch · Pull Request #461562 · NixOS/nixpkgs · GitHub

Yep that’s me (both the PR and the comment on reddit)

As is, whisper-cpp dynamically loads the backend libraries from a search path consisting of the executable path and the current working directory. For the whisper-cpp executables this is fine, because the libraries are packaged in the same directory as the executables.

However ffmpeg links against the libwhisper and libggml libraries. The libraries search the executable path which is ffmpeg’s path, cannot find the libraries, and then crashes because there are no backend devices available.

Turning that build flag on does two things

  1. Puts the backend libraries in the /lib dir instead of /bin (see here)
  2. Prepends the whisper-cpp /lib dir to the search path when it is searching for backends (see here)

I’m not quite sure what the hold up on getting that PR merged is. Seems the merge bot failed. In the meantime I’ve been using these overlays.

final: prev: {
  whisper-cpp = prev.whisper-cpp.overrideAttrs (prevAttrs: {
    cmakeFlags = prevAttrs.cmakeFlags ++ [
      (final.lib.cmakeFeature "GGML_BACKEND_DIR" "${placeholder "out"}/lib")
    ];
  });
  ffmpeg = prev.ffmpeg.override {
    withWhisper = true;
  };
}
2 Likes