WhisperHallu by EtienneAb3d

Audio preprocessing for optimized Whisper transcriptions

Created 3 years ago

349 stars

Top 79.4% on SourcePulse

Project Summary

This repository provides experimental Python code for preprocessing audio files to improve Whisper transcription accuracy and reduce hallucinations. It targets users seeking more reliable transcriptions from noisy or complex audio, offering a suite of audio manipulation techniques.

How It Works

The core approach involves a multi-stage audio preprocessing pipeline. It leverages tools like Facebook Demucs or Deezer Spleeter for voice extraction, ffmpeg for silence removal and loudness normalization, and Silero VAD for noise reduction. The process can also include adding voice markers, applying speech compression, and experimenting with various time-stretching methods to optimize the audio for Whisper's transcription models.

Quick Start & Requirements

Installation: Requires ffmpeg (version >= 4.4 recommended, upgrade instructions provided), openai-whisper, torchaudio, and optionally demucs, spleeter, or faster-whisper.
Dependencies: Python 3.x, ffmpeg.
Setup: Installation involves package installs and potentially upgrading ffmpeg.
Links:
- Standard Whisper Colab: https://colab.research.google.com/drive/1-GpXaNaGFXKX9VXl60JGVVrGO41t09KA
- Faster Whisper Colab: https://colab.research.google.com/drive/1RkvOtUTbUD5NVsRI4aKEqJO8BRo8BFIY

Highlighted Details

Integrates with WhisperTimeSync for vocal/lyric extraction and karaok-AI.
Offers multiple transcription attempts with varying parameters (beam size, temperature) for stability.
Includes options for vocal remixing and processing sub-parts of audio files.
Supports different Whisper model versions (V1, V2, V3) and faster-whisper.

Maintenance & Community

The project is experimental and appears to be a demonstration of the author's capabilities. Contact information for commercial projects is provided via https://cubaix.com.

Licensing & Compatibility

The repository does not explicitly state a license. The inclusion of code from openai/whisper and faster-whisper implies adherence to their respective licenses. Commercial use is not explicitly addressed.

Limitations & Caveats

The code is described as "experimental" and results may vary. Some preprocessing steps, like time stretching, have not shown significant gains for the author. Compatibility with specific ffmpeg versions is crucial, and Google Colab's default version may require upgrading. The effectiveness of prompts is language-dependent and may require tuning.

WhisperHallu by EtienneAb3d

Explore Similar Projects

LiveWhisper by Nikorasu

AIVoiceChat by KoljaB

sag by steipete

WhisperS2T by shashikg

speech-to-text by reriiasu

Whisperboard by Saik0s

TTS-Audio-Suite by diodiogod

faster-whisper-GUI by CheshireCC

stable-ts by jianfch

whisper_real_time by davabase

seed-vc by Plachtaa

so-vits-svc by svc-develop-team