nlp  by Majdoddin

Tool for audio transcription and speaker diarization

created 2 years ago
487 stars

Top 64.1% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a pipeline for generating speaker-attributed transcripts for audio and video files, primarily targeting users who need to analyze spoken content with speaker identification. It leverages Whisper for transcription and pyannote-audio for speaker diarization, offering an integrated solution for enhanced audio analysis.

How It Works

The core approach combines OpenAI's Whisper for speech-to-text transcription with pyannote-audio for speaker diarization. Audio segments are processed by pyannote-audio to identify different speakers. Whisper then transcribes these segments, and the diarization results are used to attribute the transcribed text to the correct speaker. The project also explores optimizations like processing concatenated audio segments with silent spacers to improve Whisper's performance, though this can introduce timestamping issues.

Quick Start & Requirements

  • Install via pip.
  • Requires Python, Whisper, and pyannote-audio.
  • GPU acceleration is recommended for performance.
  • Hugging Face token may be required for model access.

Highlighted Details

  • Integrates Whisper's word-level timestamping with pyannote-audio diarization.
  • Supports input from YouTube, local files, and Google Drive.
  • Offers HTML output with highlighted words synchronized to playback.
  • Explores methods for optimizing transcription speed by segmenting audio.

Maintenance & Community

  • Active development with contributions from multiple individuals.
  • Discussions and issue tracking are available on the GitHub repository.

Licensing & Compatibility

  • The README does not explicitly state a license.
  • Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project notes that using silent spacers between audio segments for Whisper processing can lead to unreliable timestamping, as Whisper may not consistently timestamp these spacers. This can affect the accuracy of speaker attribution and synchronization in certain audio inputs.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
9 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.