Hi, can anybody take a look at my issue with whisperx and suggest any better approaches? Apologies in advance for the AI generated description - but I’ve vetted it and I think it’s pretty clear.
# WhisperX Transcription on NixOS - Troubleshooting Guide
This document documents the challenges encountered when setting up WhisperX with speaker diarization on NixOS, and the solutions attempted.
## Goal
Extract transcripts with speaker detection from audio files and save them to an output directory.
## Environment
- System: NixOS unstable
- Python: 3.13.11
- WhisperX: 3.7.4 (via Nix packages)
- Shell: zsh with direnv
## Issues Encountered
### Issue 1: Float16 Compute Type Incompatibility
**Error:**
```
ValueError: Requested float16 compute type, but the target device or backend do not support efficient float16 computation.
```
**Cause:**
- WhisperX defaults to `float16` (half-precision) computation
- This is optimized for GPUs but not supported on most CPUs
- Running on CPU without GPU acceleration
**Solution:**
```bash
whisperx audio.m4a --compute_type float32
```
### Issue 2: Missing omegaconf Dependency
**Error:**
```
ModuleNotFoundError: No module named 'omegaconf'
```
**Cause:**
- Pyannote.audio (used for Voice Activity Detection) requires omegaconf
- The Nix package `python313Packages.whisperx` doesn't include omegaconf as a dependency
- PyTorch model loading uses pickle to unpickle model checkpoints, which requires omegaconf
**Attempted Solutions:**
1. **Installing omegaconf via nix shell:**
```bash
nix shell nixpkgs#python313Packages.omegaconf
```
Failed because whisperx is a wrapped binary with isolated Python environment
2. **Creating flake.nix with dependencies:**
Created `flake.nix` with all dependencies explicitly listed
```nix
whisperXEnv = pythonPackages.python.withPackages (p: with p; [
whisperx
omegaconf
pyannote-audio
# ... other dependencies
]);
```
Successfully activated with `nix develop` or direnv
### Issue 3: PyTorch 2.6+ Weights Only Security Feature
**Error:**
```
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded...
WeightsUnpickler error: Unsupported global: GLOBAL omegaconf.listconfig.ListConfig was not an allowed global by default.
```
**Cause:**
- PyTorch 2.6 introduced a breaking security change
- Default `weights_only` parameter changed from `False` to `True`
- Pyannote.audio models are saved with omegaconf objects
- PyTorch 2.9.1 (in Nix) rejects these as "unsafe" by default
- The error occurs even without `--diarize` flag because VAD uses pyannote
**Version Mismatch:**
```
Nix PyTorch: 2.9.1 (weights_only=True by default)
pyannote model: saved with omegaconf (incompatible)
```
**Why This Happens:**
1. WhisperX uses pyannote.audio for Voice Activity Detection (VAD)
2. VAD runs by default even without speaker diarization
3. Pyannote models are saved with omegaconf serialization
4. PyTorch 2.6+ blocks loading these for security reasons
## Solutions Attempted
### Attempted Solution 1: Add omegaconf via Nix Shell
**Status:** Failed
**Reason:** Nix-wrapped binaries have isolated environments
### Attempted Solution 2: Create Comprehensive flake.nix
**Status:** Partial success
**Files created:**
- `flake.nix` - Development environment definition
- `.envrc` - Direnv integration for automatic loading
- Updated `.gitignore` with Nix artifacts
**Result:** Environment activates successfully but PyTorch compatibility issue remains
### Attempted Solution 3: Skip Diarization
**Status:** Failed
**Reason:** VAD (Voice Activity Detection) uses pyannote by default, which triggers the same error
## Root Cause Analysis
The fundamental issue is a **package version incompatibility** in Nixpkgs:
```
whisperx 3.7.4 → pyannote.audio 4.0.1 → requires older PyTorch or weights_only=False
Nix has PyTorch 2.9.1 → uses weights_only=True by default → incompatible with pyannote models
```
This is a classic Nix packaging issue where:
- Packages are built with latest available dependencies
- Upstream packages haven't adapted to PyTorch 2.6+ changes
- No easy way to downgrade specific packages in Nix
## Possible Solutions
### Solution A: Use uv for Python Environment (Recommended)
Create a Python environment with compatible package versions:
```bash
# Create pyproject.toml or use inline script dependencies
uv venv
source .venv/bin/activate
uv pip install "whisperx>=3.0.0" "torch>=2.0.0,<2.6.0" "pyannote.audio>=3.0.0"
```
**Pros:**
- Control over exact package versions
- Avoids Nix packaging delays
- Still reproducible via lock files
- Works with user's existing uv workflow
**Cons:**
- Not pure Nix solution
- Requires additional setup
### Solution B: Patch pyannote.audio for PyTorch 2.6+
Monkey-patch pyannote to handle weights_only loading:
```python
import torch.serialization
torch.serialization.add_safe_globals(['omegaconf.listconfig.ListConfig'])
```
**Pros:**
- Keeps Nix environment
- Minimal changes
**Cons:**
- Requires modifying library code
- Security implications of weights_only=False
- Need to maintain patch
### Solution C: Use Alternative VAD Method
Try using Silero VAD instead of pyannote:
```bash
whisperx audio.m4a --vad_method silero --compute_type float32
```
**Pros:**
- Avoids pyannote entirely
- Potentially faster
**Cons:**
- May not work with diarization
- Different accuracy characteristics
### Solution D: File Nix Package Bug
Report the issue to Nixpkgs:
- `python313Packages.whisperx` missing dependencies
- PyTorch version incompatibility
**Action:** Create issue at https://github.com/NixOS/nixpkgs
## Recommended Next Steps
1. **Try Silero VAD first:**
```bash
whisperx audio.m4a --vad_method silero --compute_type float32 --output_dir output
```
2. **If that works, try diarization with Silero:**
```bash
whisperx audio.m4a --vad_method silero --diarize --compute_type float32 --output_dir output
```
3. **If still failing, use uv environment:**
- Install compatible versions
- Document working versions in requirements.txt
## Files Created
- `flake.nix` - Nix development environment (ready but blocked by PyTorch issue)
- `.envrc` - Direnv configuration
- `.gitignore` - Updated with Nix artifacts
## Current Status
**Blocked** - Unable to run whisperX even without diarization due to PyTorch 2.6+ weights_only security feature incompatibility with pyannote.audio models in Nix package.
## Learnings
1. **Nix package isolation:** Wrapped binaries have hardcoded environments that `nix shell` can't modify
2. **PyTorch 2.6 breaking change:** Security improvements break compatibility with older ML model checkpoints
3. **Dependency chains:** whisperX → pyannote.audio → PyTorch means incompatibilities at any level block everything
4. **Nix packaging lag:** Upstream changes (PyTorch 2.6) take time to propagate through Nixpkgs ecosystem
## Resources
- WhisperX GitHub: https://github.com/m-bain/whisperX
- PyTorch 2.6 release notes: https://pytorch.org/blog/pytorch-2-6/
- Nixpkgs issue tracker: https://github.com/NixOS/nixpkgs/issues
- Pyannote.audio documentation: https://github.com/pyannote/pyannote-audio