nlp by Majdoddin

Tool for audio transcription and speaker diarization

Created 3 years ago

491 stars

Top 63.0% on SourcePulse

Project Summary

This project provides a pipeline for generating speaker-attributed transcripts for audio and video files, primarily targeting users who need to analyze spoken content with speaker identification. It leverages Whisper for transcription and pyannote-audio for speaker diarization, offering an integrated solution for enhanced audio analysis.

How It Works

The core approach combines OpenAI's Whisper for speech-to-text transcription with pyannote-audio for speaker diarization. Audio segments are processed by pyannote-audio to identify different speakers. Whisper then transcribes these segments, and the diarization results are used to attribute the transcribed text to the correct speaker. The project also explores optimizations like processing concatenated audio segments with silent spacers to improve Whisper's performance, though this can introduce timestamping issues.

Quick Start & Requirements

Install via pip.
Requires Python, Whisper, and pyannote-audio.
GPU acceleration is recommended for performance.
Hugging Face token may be required for model access.

Highlighted Details

Integrates Whisper's word-level timestamping with pyannote-audio diarization.
Supports input from YouTube, local files, and Google Drive.
Offers HTML output with highlighted words synchronized to playback.
Explores methods for optimizing transcription speed by segmenting audio.

Maintenance & Community

Active development with contributions from multiple individuals.
Discussions and issue tracking are available on the GitHub repository.

Licensing & Compatibility

The README does not explicitly state a license.
Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project notes that using silent spacers between audio segments for Whisper processing can lead to unreliable timestamping, as Whisper may not consistently timestamp these spacers. This can affect the accuracy of speaker attribution and synchronization in certain audio inputs.

Health Check

Last Commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days