Late last year FFmpeg introduced a filter to transcribe audio using whisper.cpp: https://ffmpeg.org/ffplay-all.html#whisper-1. I tried this guide but have errors when I try the following command. I want it to run on my RTX 3090, using whisper-cli woks fine with CUDA.
ffmpeg -i https://github.com/vpalmisano/webrtcperf/releases/download/videos-1.0/gvr.mp4 -vn -af "whisper=model=ggml-large-v3.bin :language=en :queue=3 :destination=output.srt :format=srt" -f null -
My NixOS config:
# Packages
nixpkgs = {
config = {
cudaSupport = true;
allowUnfree = true;
};
};
environment.systemPackages = [
(pkgs.ffmpeg-full.override {
withUnfree = true;
})
pkgs.whisper-cpp-vulkan
];
programs.nix-ld.libraries = [
config.boot.kernelPackages.nvidia_x11
];
hardware = {
graphics.enable = true;
nvidia = {
nvidiaSettings = true;
open = true;
};
};
Error Log:
ffmpeg version 8.0 Copyright (c) 2000-2025 the FFmpeg developers
built with gcc 14.3.0 (GCC)
...
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'https://github.com/vpalmisano/webrtcperf/releases/download/videos-1.0/gvr.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf61.9.100
Duration: 00:01:53.00, start: 0.000000, bitrate: 1201 kb/s
Stream #0:0[0x1](und): Video: h264 (High) (avc1 / 0x31637661), yuv420p(tv, bt709, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 1122 kb/s, 30 fps, 30 tbr, 15360 tbn (default)
Metadata:
handler_name : ISO Media file produced by Google Inc.
vendor_id : [0][0][0][0]
encoder : Lavc60.35.100 libx264
Stream #0:1[0x2](eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, mono, fltp, 70 kb/s (default)
Metadata:
handler_name : ISO Media file produced by Google Inc.
vendor_id : [0][0][0][0]
/build/source/ggml/src/ggml-backend.cpp:501: GGML_ASSERT(device) failed
/nix/store/r2x6hdy8y2b3czsg3993rjz3mh687dkc-whisper-cpp-1.8.2/lib/libggml-base.so(+0x143fd) [0x7fd6411ef3fd]
/nix/store/r2x6hdy8y2b3czsg3993rjz3mh687dkc-whisper-cpp-1.8.2/lib/libggml-base.so(ggml_print_backtrace+0x216) [0x7fd6411ef7b6]
/nix/store/r2x6hdy8y2b3czsg3993rjz3mh687dkc-whisper-cpp-1.8.2/lib/libggml-base.so(ggml_abort+0x144)[0x7fd6411ef974]
/nix/store/r2x6hdy8y2b3czsg3993rjz3mh687dkc-whisper-cpp-1.8.2/lib/libggml-base.so(+0x26507) [0x7fd641201507]
/nix/store/r2x6hdy8y2b3czsg3993rjz3mh687dkc-whisper-cpp-1.8.2/lib/libwhisper.so.1(+0x360f3) [0x7fd656da50f3]
/nix/store/r2x6hdy8y2b3czsg3993rjz3mh687dkc-whisper-cpp-1.8.2/lib/libwhisper.so.1(+0x38753) [0x7fd656da7753]
/nix/store/r2x6hdy8y2b3czsg3993rjz3mh687dkc-whisper-cpp-1.8.2/lib/libwhisper.so.1(whisper_init_with_params_no_state+0x29e) [0x7fd656da9e6e]
/nix/store/r2x6hdy8y2b3czsg3993rjz3mh687dkc-whisper-cpp-1.8.2/lib/libwhisper.so.1(whisper_init_from_file_with_params_no_state+0x1ab) [0x7fd656dadf2b]
/nix/store/r2x6hdy8y2b3czsg3993rjz3mh687dkc-whisper-cpp-1.8.2/lib/libwhisper.so.1(whisper_init_from_file_with_params+0x2b) [0x7fd656db176b]
/nix/store/hfwrpc7ycs53hdvfvgrvx8xa6rkpf725-ffmpeg-full-8.0-lib/lib/libavfilter.so.11(+0x17184b) [0x7fd65a77184b]
/nix/store/hfwrpc7ycs53hdvfvgrvx8xa6rkpf725-ffmpeg-full-8.0-lib/lib/libavfilter.so.11(avfilter_init_dict+0x71) [0x7fd65a797b51]
/nix/store/hfwrpc7ycs53hdvfvgrvx8xa6rkpf725-ffmpeg-full-8.0-lib/lib/libavfilter.so.11(avfilter_graph_segment_init+0x58) [0x7fd65a7c6308]
/nix/store/hfwrpc7ycs53hdvfvgrvx8xa6rkpf725-ffmpeg-full-8.0-lib/lib/libavfilter.so.11(avfilter_graph_segment_apply+0x44) [0x7fd65a7c6c34]
ffmpeg(+0x1fbda) [0x55d7a200fbda]
ffmpeg(+0x23545) [0x55d7a2013545]
ffmpeg(+0x257f1) [0x55d7a20157f1]
ffmpeg(+0x2c8a0) [0x55d7a201c8a0]
ffmpeg(+0x2d042) [0x55d7a201d042]
ffmpeg(+0x2d5c4) [0x55d7a201d5c4]
ffmpeg(+0x2e05a) [0x55d7a201e05a]
ffmpeg(+0x32599) [0x55d7a2022599]
ffmpeg(+0x356a3) [0x55d7a20256a3]
ffmpeg(main+0xa2) [0x55d7a2003142]
/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6(+0x2a4d8) [0x7fd65722a4d8]
/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6(__libc_start_main+0x8b) [0x7fd65722a59b]
ffmpeg(+0x13cc5) [0x55d7a2003cc5]
fish: Job 1, 'ffmpeg -i https://github.com/vp…' terminated by signal SIGABRT (Abort)