tinydiarize  by akashmjn

Finetuned speech model for speaker diarization

created 2 years ago
503 stars

Top 62.7% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a minimal extension to OpenAI's Whisper for speaker diarization, labeling who spoke when in transcripts. It's designed for researchers and developers working with conversational audio like meetings and podcasts, offering a lightweight and interpretable solution that integrates seamlessly with Whisper.

How It Works

The approach fine-tunes Whisper models to incorporate special tokens that denote speaker changes. This method leverages both voice and semantic context for improved speaker differentiation, a unique advantage over traditional diarization techniques. The minimal changes required (<50 lines) make it an efficient and cost-effective solution.

Quick Start & Requirements

Highlighted Details

  • Achieves 97.7% speaker turn precision and 70.8% recall on a small benchmark set.
  • Maintains a similar Word Error Rate (WER) to the original Whisper model (10.3% vs 11.0%).
  • Fine-tuning requires minimal resources (~30 mins on 1 GPU).
  • Experimental support for whisper.cpp enables running on consumer hardware.

Maintenance & Community

The project is described as a prototype/proof-of-concept, with plans for future development outlined in the roadmap. However, a recent update indicates plans have been paused.

Licensing & Compatibility

Code and model weights are released under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

Currently, only the small.en English model is fine-tuned. Timestamp behavior and deletion errors may differ from the original Whisper model. The project is still considered hacky and subject to change, with global diarization (speaker clustering) planned for a later stage.

Health Check
Last commit

1 year ago

Responsiveness

1+ week

Pull Requests (30d)
0
Issues (30d)
0
Star History
13 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.