Whisper-transcription_and_diarization-speaker-identification-  by lablab-ai

Audio transcription/diarization using Whisper and pyannote-audio

created 2 years ago
349 stars

Top 80.8% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides a tutorial on combining OpenAI's Whisper for speech-to-text transcription with pyannote.audio for speaker diarization, addressing the limitation of Whisper not identifying speakers in conversations. It's targeted at users needing to analyze multi-speaker audio content, offering a practical solution for segmenting and labeling speech.

How It Works

The approach leverages yt-dlp to download and extract audio from videos, then pydub to segment the audio. pyannote.audio is used with a pre-trained pipeline to perform speaker diarization, identifying speech segments and assigning speaker labels. Finally, OpenAI's Whisper model transcribes these diarized segments, and the output is combined into an HTML file that annotates the transcriptions with speaker information and timestamps.

Quick Start & Requirements

  • Install dependencies: pip install -U yt-dlp pydub pyannote.audio webvtt-py and ffmpeg.
  • Download audio: yt-dlp -xv --ffmpeg-location <ffmpeg_path> --audio-format wav -o download.wav <youtube_url>
  • Process audio with Python scripts provided in the README.
  • Requires Python and ffmpeg.

Highlighted Details

  • Integrates state-of-the-art Whisper and pyannote.audio for comprehensive speech analysis.
  • Generates an HTML output for easy review of transcribed and diarized content.
  • Demonstrates audio preparation, segmentation, and transcription workflow.

Maintenance & Community

  • Project is hosted on GitHub by lablab-ai.
  • Encourages joining the LabLab Discord for discussions and AI hackathon events.

Licensing & Compatibility

  • The README does not explicitly state a license for the provided code or tutorial.
  • OpenAI's Whisper models are released under a permissive license. pyannote.audio is typically under MIT.

Limitations & Caveats

  • The tutorial notes a potential version conflict between pyannote.audio and Whisper, requiring a specific execution order.
  • The provided code snippets are illustrative and may require adjustments for different audio inputs or environments.
Health Check
Last commit

2 years ago

Responsiveness

1+ week

Pull Requests (30d)
0
Issues (30d)
0
Star History
12 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.