pyannote-audio  by pyannote

Speaker diarization toolkit

Created 9 years ago
8,294 stars

Top 6.2% on SourcePulse

GitHubView on GitHub
Project Summary

This toolkit provides neural building blocks for speaker diarization, enabling speech activity detection, speaker change detection, overlapped speech detection, and speaker embedding. It is targeted at researchers and developers working with audio analysis and offers state-of-the-art pretrained models and pipelines that can be fine-tuned for improved performance on custom datasets.

How It Works

The toolkit is built on the PyTorch machine learning framework and offers a Python-first API. It leverages pretrained models and pipelines available on the Hugging Face Hub, allowing users to quickly apply advanced speaker diarization techniques. The core advantage lies in its modular design, providing "neural building blocks" that can be combined and fine-tuned, enabling customization for specific audio analysis tasks and datasets.

Quick Start & Requirements

  • Install with: pip install pyannote.audio
  • Requires accepting user conditions for specific models (e.g., pyannote/segmentation-3.0, pyannote/speaker-diarization-3.1).
  • Requires a Hugging Face access token.
  • Supports GPU acceleration via PyTorch (pipeline.to(torch.device("cuda"))).
  • Official documentation, changelog, and FAQs are available.

Highlighted Details

  • Features pretrained pipelines and models on Hugging Face Hub.
  • Claims state-of-the-art performance, with v3.1 showing significant improvements over v2.x across various benchmarks.
  • Supports multi-GPU training with PyTorch Lightning.
  • Includes a blog with insights into achieving top rankings in speaker diarization challenges.

Maintenance & Community

The project is actively maintained, with recent blog posts and model updates. It has a presence on Hugging Face and links to community contributions are provided.

Licensing & Compatibility

The README does not explicitly state the license type or compatibility for commercial use. Users are advised to check the specific model licenses on Hugging Face.

Limitations & Caveats

The README suggests considering pyannoteAI for production use cases, implying that pyannote.audio might be more research-oriented or less optimized for high-throughput production environments. Specific model usage requires accepting user conditions and obtaining Hugging Face tokens.

Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
17
Issues (30d)
5
Star History
205 stars in the last 30 days

Explore Similar Projects

Starred by Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake).

awesome-diarization by wq2012

0.2%
2k
List of resources for speaker diarization
Created 6 years ago
Updated 1 month ago
Starred by Tim J. Baek Tim J. Baek(Founder of Open WebUI), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
4 more.

StyleTTS2 by yl4579

0.2%
6k
Text-to-speech model achieving human-level synthesis
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.