WhisperX and python dependencies

Hi, can anybody take a look at my issue with whisperx and suggest any better approaches? Apologies in advance for the AI generated description - but I’ve vetted it and I think it’s pretty clear.


# WhisperX Transcription on NixOS - Troubleshooting Guide

This document documents the challenges encountered when setting up WhisperX with speaker diarization on NixOS, and the solutions attempted.

## Goal

Extract transcripts with speaker detection from audio files and save them to an output directory.

## Environment

- System: NixOS unstable
- Python: 3.13.11
- WhisperX: 3.7.4 (via Nix packages)
- Shell: zsh with direnv

## Issues Encountered

### Issue 1: Float16 Compute Type Incompatibility

**Error:**
```
ValueError: Requested float16 compute type, but the target device or backend do not support efficient float16 computation.
```

**Cause:**
- WhisperX defaults to `float16` (half-precision) computation
- This is optimized for GPUs but not supported on most CPUs
- Running on CPU without GPU acceleration

**Solution:**
```bash
whisperx audio.m4a --compute_type float32
```

### Issue 2: Missing omegaconf Dependency

**Error:**
```
ModuleNotFoundError: No module named 'omegaconf'
```

**Cause:**
- Pyannote.audio (used for Voice Activity Detection) requires omegaconf
- The Nix package `python313Packages.whisperx` doesn't include omegaconf as a dependency
- PyTorch model loading uses pickle to unpickle model checkpoints, which requires omegaconf

**Attempted Solutions:**

1. **Installing omegaconf via nix shell:**
   ```bash
   nix shell nixpkgs#python313Packages.omegaconf
   ```
   Failed because whisperx is a wrapped binary with isolated Python environment

2. **Creating flake.nix with dependencies:**
   Created `flake.nix` with all dependencies explicitly listed
   ```nix
   whisperXEnv = pythonPackages.python.withPackages (p: with p; [
     whisperx
     omegaconf
     pyannote-audio
     # ... other dependencies
   ]);
   ```
   Successfully activated with `nix develop` or direnv

### Issue 3: PyTorch 2.6+ Weights Only Security Feature

**Error:**
```
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded...
WeightsUnpickler error: Unsupported global: GLOBAL omegaconf.listconfig.ListConfig was not an allowed global by default.
```

**Cause:**
- PyTorch 2.6 introduced a breaking security change
- Default `weights_only` parameter changed from `False` to `True`
- Pyannote.audio models are saved with omegaconf objects
- PyTorch 2.9.1 (in Nix) rejects these as "unsafe" by default
- The error occurs even without `--diarize` flag because VAD uses pyannote

**Version Mismatch:**
```
Nix PyTorch: 2.9.1 (weights_only=True by default)
pyannote model: saved with omegaconf (incompatible)
```

**Why This Happens:**
1. WhisperX uses pyannote.audio for Voice Activity Detection (VAD)
2. VAD runs by default even without speaker diarization
3. Pyannote models are saved with omegaconf serialization
4. PyTorch 2.6+ blocks loading these for security reasons

## Solutions Attempted

### Attempted Solution 1: Add omegaconf via Nix Shell
**Status:** Failed
**Reason:** Nix-wrapped binaries have isolated environments

### Attempted Solution 2: Create Comprehensive flake.nix
**Status:** Partial success
**Files created:**
- `flake.nix` - Development environment definition
- `.envrc` - Direnv integration for automatic loading
- Updated `.gitignore` with Nix artifacts

**Result:** Environment activates successfully but PyTorch compatibility issue remains

### Attempted Solution 3: Skip Diarization
**Status:** Failed
**Reason:** VAD (Voice Activity Detection) uses pyannote by default, which triggers the same error

## Root Cause Analysis

The fundamental issue is a **package version incompatibility** in Nixpkgs:

```
whisperx 3.7.4 → pyannote.audio 4.0.1 → requires older PyTorch or weights_only=False
Nix has PyTorch 2.9.1 → uses weights_only=True by default → incompatible with pyannote models
```

This is a classic Nix packaging issue where:
- Packages are built with latest available dependencies
- Upstream packages haven't adapted to PyTorch 2.6+ changes
- No easy way to downgrade specific packages in Nix

## Possible Solutions

### Solution A: Use uv for Python Environment (Recommended)

Create a Python environment with compatible package versions:

```bash
# Create pyproject.toml or use inline script dependencies
uv venv
source .venv/bin/activate
uv pip install "whisperx>=3.0.0" "torch>=2.0.0,<2.6.0" "pyannote.audio>=3.0.0"
```

**Pros:**
- Control over exact package versions
- Avoids Nix packaging delays
- Still reproducible via lock files
- Works with user's existing uv workflow

**Cons:**
- Not pure Nix solution
- Requires additional setup

### Solution B: Patch pyannote.audio for PyTorch 2.6+

Monkey-patch pyannote to handle weights_only loading:

```python
import torch.serialization
torch.serialization.add_safe_globals(['omegaconf.listconfig.ListConfig'])
```

**Pros:**
- Keeps Nix environment
- Minimal changes

**Cons:**
- Requires modifying library code
- Security implications of weights_only=False
- Need to maintain patch

### Solution C: Use Alternative VAD Method

Try using Silero VAD instead of pyannote:

```bash
whisperx audio.m4a --vad_method silero --compute_type float32
```

**Pros:**
- Avoids pyannote entirely
- Potentially faster

**Cons:**
- May not work with diarization
- Different accuracy characteristics

### Solution D: File Nix Package Bug

Report the issue to Nixpkgs:
- `python313Packages.whisperx` missing dependencies
- PyTorch version incompatibility

**Action:** Create issue at https://github.com/NixOS/nixpkgs

## Recommended Next Steps

1. **Try Silero VAD first:**
   ```bash
   whisperx audio.m4a --vad_method silero --compute_type float32 --output_dir output
   ```

2. **If that works, try diarization with Silero:**
   ```bash
   whisperx audio.m4a --vad_method silero --diarize --compute_type float32 --output_dir output
   ```

3. **If still failing, use uv environment:**
   - Install compatible versions
   - Document working versions in requirements.txt

## Files Created

- `flake.nix` - Nix development environment (ready but blocked by PyTorch issue)
- `.envrc` - Direnv configuration
- `.gitignore` - Updated with Nix artifacts

## Current Status

**Blocked** - Unable to run whisperX even without diarization due to PyTorch 2.6+ weights_only security feature incompatibility with pyannote.audio models in Nix package.

## Learnings

1. **Nix package isolation:** Wrapped binaries have hardcoded environments that `nix shell` can't modify
2. **PyTorch 2.6 breaking change:** Security improvements break compatibility with older ML model checkpoints
3. **Dependency chains:** whisperX → pyannote.audio → PyTorch means incompatibilities at any level block everything
4. **Nix packaging lag:** Upstream changes (PyTorch 2.6) take time to propagate through Nixpkgs ecosystem

## Resources

- WhisperX GitHub: https://github.com/m-bain/whisperX
- PyTorch 2.6 release notes: https://pytorch.org/blog/pytorch-2-6/
- Nixpkgs issue tracker: https://github.com/NixOS/nixpkgs/issues
- Pyannote.audio documentation: https://github.com/pyannote/pyannote-audio

Sorry, what’s the issue? Is it not running, or what?

Hi, this post is rather difficult to grok, likely due to its AI-generated nature.

Issue 1: I don’t see what this has to do with nix? your own post claims to hold a solution to the problem described directly before it.

Issue 2: whisperX’s pyproject.toml does not list omegaconf as a dependency, but it is a dep of pyannote-audio, so this seems simply wrong?
whisperX does not directly import omegaconf, as confirmed by grep.

Issue 3: how is torch 2.6 relevant? the whisperX’s pyproject.toml specifies torch 2.8.0, as does pyannote-audio’s which we have relaxed due to torch reaching 2.9.1.

Could you post a minimal reproducible example, including links to files used, so that necessary action can be inferred?

1 Like

According issue (that has been lost in the void because maintainer wasn’t tagged):

1 Like