Discover and explore top open-source AI tools and projects—updated daily.
lablab-aiAudio transcription/diarization using Whisper and pyannote-audio
Top 77.2% on SourcePulse
This repository provides a tutorial on combining OpenAI's Whisper for speech-to-text transcription with pyannote.audio for speaker diarization, addressing the limitation of Whisper not identifying speakers in conversations. It's targeted at users needing to analyze multi-speaker audio content, offering a practical solution for segmenting and labeling speech.
How It Works
The approach leverages yt-dlp to download and extract audio from videos, then pydub to segment the audio. pyannote.audio is used with a pre-trained pipeline to perform speaker diarization, identifying speech segments and assigning speaker labels. Finally, OpenAI's Whisper model transcribes these diarized segments, and the output is combined into an HTML file that annotates the transcriptions with speaker information and timestamps.
Quick Start & Requirements
pip install -U yt-dlp pydub pyannote.audio webvtt-py and ffmpeg.yt-dlp -xv --ffmpeg-location <ffmpeg_path> --audio-format wav -o download.wav <youtube_url>ffmpeg.Highlighted Details
pyannote.audio for comprehensive speech analysis.Maintenance & Community
Licensing & Compatibility
pyannote.audio is typically under MIT.Limitations & Caveats
pyannote.audio and Whisper, requiring a specific execution order.3 years ago
Inactive
m-bain