Whisper-transcription_and_diarization-speaker-identification- by lablab-ai

Audio transcription/diarization using Whisper and pyannote-audio

Created 3 years ago

368 stars

Top 76.7% on SourcePulse

1 Expert Loves This Project

eugeneyan

AI Scientist at AWS

Project Summary

This repository provides a tutorial on combining OpenAI's Whisper for speech-to-text transcription with pyannote.audio for speaker diarization, addressing the limitation of Whisper not identifying speakers in conversations. It's targeted at users needing to analyze multi-speaker audio content, offering a practical solution for segmenting and labeling speech.

How It Works

The approach leverages yt-dlp to download and extract audio from videos, then pydub to segment the audio. pyannote.audio is used with a pre-trained pipeline to perform speaker diarization, identifying speech segments and assigning speaker labels. Finally, OpenAI's Whisper model transcribes these diarized segments, and the output is combined into an HTML file that annotates the transcriptions with speaker information and timestamps.

Quick Start & Requirements

Install dependencies: pip install -U yt-dlp pydub pyannote.audio webvtt-py and ffmpeg.
Download audio: yt-dlp -xv --ffmpeg-location <ffmpeg_path> --audio-format wav -o download.wav <youtube_url>
Process audio with Python scripts provided in the README.
Requires Python and ffmpeg.

Highlighted Details

Integrates state-of-the-art Whisper and pyannote.audio for comprehensive speech analysis.
Generates an HTML output for easy review of transcribed and diarized content.
Demonstrates audio preparation, segmentation, and transcription workflow.

Maintenance & Community

Project is hosted on GitHub by lablab-ai.
Encourages joining the LabLab Discord for discussions and AI hackathon events.

Licensing & Compatibility

The README does not explicitly state a license for the provided code or tutorial.
OpenAI's Whisper models are released under a permissive license. pyannote.audio is typically under MIT.

Limitations & Caveats

The tutorial notes a potential version conflict between pyannote.audio and Whisper, requiring a specific execution order.
The provided code snippets are illustrative and may require adjustments for different audio inputs or environments.

Health Check

Last Commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

1 stars in the last 30 days

Explore Similar Projects

susi_translator by susiai

Real-time audio transcription system

Created 8 years ago

Updated 1 year ago

yt-transcriber by pmarreck

YouTube transcription TUI app

Created 1 year ago

Updated 3 weeks ago

reverb by revdotcom

Open-source inference code for speech recognition and diarization models

Created 1 year ago

Updated 8 months ago

AudioToText by Carleslc

CLI tool for audio transcription and translation

Created 2 years ago

Updated 2 years ago

nlp by Majdoddin

Tool for audio transcription and speaker diarization

Created 3 years ago

Updated 4 months ago

aTrain by JuergenFleiss

GUI tool for offline speech transcription, speaker diarization

Created 2 years ago

Updated 3 days ago

Starred by

Travis Fischer

Travis Fischer(Founder of Agentic).

whispo by egoist

AI-powered dictation tool

Created 1 year ago

Updated 1 year ago

speech-to-text by reriiasu

Real-time transcription tool using faster-whisper

Created 2 years ago

Updated 1 year ago

Starred by

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind).

whisper-plus by kadirnar

Speech-to-text toolkit for enhanced audio processing

Created 2 years ago

Updated 1 month ago

Starred by

Travis Fischer

Travis Fischer(Founder of Agentic).

writeout.ai by beyondcode

Web app for audio transcription and translation

Created 2 years ago

Updated 2 years ago

Starred by

Shawn Wang

Shawn Wang(Editor of Latent Space) and

Magnus Müller

Magnus Müller(Cofounder of Browser Use).

noScribe by kaixxx

GUI tool for local AI-powered audio transcription

Created 2 years ago

Updated 1 week ago

Starred by

Georgios Konstantopoulos

Georgios Konstantopoulos(CTO, General Partner at Paradigm),

Paul Gauthier

Paul Gauthier(Founder of Aider), and

9 more.

whisperX by m-bain

ASR tool for accurate, batched, word-level Whisper transcriptions

Created 3 years ago

Updated 2 months ago

Feedback? Help us improve.