WhisperHallu  by EtienneAb3d

Audio preprocessing for optimized Whisper transcriptions

created 2 years ago
333 stars

Top 83.6% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides experimental Python code for preprocessing audio files to improve Whisper transcription accuracy and reduce hallucinations. It targets users seeking more reliable transcriptions from noisy or complex audio, offering a suite of audio manipulation techniques.

How It Works

The core approach involves a multi-stage audio preprocessing pipeline. It leverages tools like Facebook Demucs or Deezer Spleeter for voice extraction, ffmpeg for silence removal and loudness normalization, and Silero VAD for noise reduction. The process can also include adding voice markers, applying speech compression, and experimenting with various time-stretching methods to optimize the audio for Whisper's transcription models.

Quick Start & Requirements

Highlighted Details

  • Integrates with WhisperTimeSync for vocal/lyric extraction and karaok-AI.
  • Offers multiple transcription attempts with varying parameters (beam size, temperature) for stability.
  • Includes options for vocal remixing and processing sub-parts of audio files.
  • Supports different Whisper model versions (V1, V2, V3) and faster-whisper.

Maintenance & Community

The project is experimental and appears to be a demonstration of the author's capabilities. Contact information for commercial projects is provided via https://cubaix.com.

Licensing & Compatibility

The repository does not explicitly state a license. The inclusion of code from openai/whisper and faster-whisper implies adherence to their respective licenses. Commercial use is not explicitly addressed.

Limitations & Caveats

The code is described as "experimental" and results may vary. Some preprocessing steps, like time stretching, have not shown significant gains for the author. Compatibility with specific ffmpeg versions is crucial, and Google Colab's default version may require upgrading. The effectiveness of prompts is language-dependent and may require tuning.

Health Check
Last commit

8 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
10 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.