pyannote-audio by pyannote

Speaker diarization toolkit

Created 9 years ago

8,950 stars

Top 5.7% on SourcePulse

View on GitHub

7 Experts Love This Project

Tim J. Baek

Founder of Open WebUI

Luis Capelo

Cofounder of Lightning AI

Travis Fischer

Founder of Agentic

Patrick von Platen

Author of Hugging Face Diffusers; Research Engineer at Mistral

and 3 more!

Project Summary

This toolkit provides neural building blocks for speaker diarization, enabling speech activity detection, speaker change detection, overlapped speech detection, and speaker embedding. It is targeted at researchers and developers working with audio analysis and offers state-of-the-art pretrained models and pipelines that can be fine-tuned for improved performance on custom datasets.

How It Works

The toolkit is built on the PyTorch machine learning framework and offers a Python-first API. It leverages pretrained models and pipelines available on the Hugging Face Hub, allowing users to quickly apply advanced speaker diarization techniques. The core advantage lies in its modular design, providing "neural building blocks" that can be combined and fine-tuned, enabling customization for specific audio analysis tasks and datasets.

Quick Start & Requirements

Install with: pip install pyannote.audio
Requires accepting user conditions for specific models (e.g., pyannote/segmentation-3.0, pyannote/speaker-diarization-3.1).
Requires a Hugging Face access token.
Supports GPU acceleration via PyTorch (pipeline.to(torch.device("cuda"))).
Official documentation, changelog, and FAQs are available.

Highlighted Details

Features pretrained pipelines and models on Hugging Face Hub.
Claims state-of-the-art performance, with v3.1 showing significant improvements over v2.x across various benchmarks.
Supports multi-GPU training with PyTorch Lightning.
Includes a blog with insights into achieving top rankings in speaker diarization challenges.

Maintenance & Community

The project is actively maintained, with recent blog posts and model updates. It has a presence on Hugging Face and links to community contributions are provided.

Licensing & Compatibility

The README does not explicitly state the license type or compatibility for commercial use. Users are advised to check the specific model licenses on Hugging Face.

Limitations & Caveats

The README suggests considering pyannoteAI for production use cases, implying that pyannote.audio might be more research-oriented or less optimized for high-throughput production environments. Specific model usage requires accepting user conditions and obtaining Hugging Face tokens.

Health Check

Last Commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

129 stars in the last 30 days