DiariZen  by BUTSpeechFIT

Speaker diarization toolkit

Created 1 year ago
293 stars

Top 90.1% on SourcePulse

GitHubView on GitHub
Project Summary

DiariZen is a speaker diarization toolkit designed for researchers and practitioners in speech processing. It offers a streamlined approach to identifying and segmenting speech by different speakers in audio recordings, leveraging advanced models from AudioZen and Pyannote.

How It Works

DiariZen utilizes pre-trained models based on WavLM, a self-supervised learning model, and incorporates structured pruning techniques to create compact yet accurate diarization systems. This approach aims to improve efficiency and generalization by reducing model redundancy without significant performance degradation, as demonstrated by its competitive benchmark results.

Quick Start & Requirements

  • Installation: Requires creating a Conda environment with Python 3.10, installing PyTorch 2.1.1 with CUDA 12.1 support, and then installing DiariZen and its dependencies via pip install -r requirements.txt && pip install -e .. Pyannote-audio also needs to be installed with its development and testing extras. Git submodules for dscore must be initialized and updated.
  • Prerequisites: CUDA 12.1, Python 3.10, PyTorch 2.1.1, PyTorch-CUDA 12.1, PyTorch-Vision 0.16.1, PyTorch-Audio 2.1.1.
  • Usage: Inference is supported via Hugging Face Transformers. Example Python code is provided for loading pre-trained models and applying the diarization pipeline. Training and pruning recipes are available in recipes/.

Highlighted Details

  • Achieves state-of-the-art performance on various datasets, with Diarization Error Rates (DER) as low as 9.2% on VoxConverse and 9.8% on AISHELL-4 using the DiariZen-Large-s80 model.
  • Models are trained on a compound dataset and then optimized using structured pruning.
  • Supports saving RTTM results directly from the pipeline.

Maintenance & Community

  • Recent updates (June 3, 2025) include structured pruning recipes, new pre-trained models, and updated benchmark results.
  • Contact: ihan@fit.vut.cz

Licensing & Compatibility

  • Licensed under the MIT license, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

  • The provided installation instructions require specific versions of PyTorch and CUDA, which might be a barrier for users with different hardware or software configurations.
  • Benchmark results are evaluated without applying a collar and without domain adaptation, which may not reflect real-world performance in all scenarios.
Health Check
Last Commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
5
Star History
29 stars in the last 30 days

Explore Similar Projects

Starred by Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake).

awesome-diarization by wq2012

0.2%
2k
List of resources for speaker diarization
Created 6 years ago
Updated 1 month ago
Feedback? Help us improve.