DiariZen is a speaker diarization toolkit designed for researchers and practitioners in speech processing. It offers a streamlined approach to identifying and segmenting speech by different speakers in audio recordings, leveraging advanced models from AudioZen and Pyannote.
How It Works
DiariZen utilizes pre-trained models based on WavLM, a self-supervised learning model, and incorporates structured pruning techniques to create compact yet accurate diarization systems. This approach aims to improve efficiency and generalization by reducing model redundancy without significant performance degradation, as demonstrated by its competitive benchmark results.
Quick Start & Requirements
- Installation: Requires creating a Conda environment with Python 3.10, installing PyTorch 2.1.1 with CUDA 12.1 support, and then installing DiariZen and its dependencies via
pip install -r requirements.txt && pip install -e .
. Pyannote-audio also needs to be installed with its development and testing extras. Git submodules for dscore
must be initialized and updated.
- Prerequisites: CUDA 12.1, Python 3.10, PyTorch 2.1.1, PyTorch-CUDA 12.1, PyTorch-Vision 0.16.1, PyTorch-Audio 2.1.1.
- Usage: Inference is supported via Hugging Face Transformers. Example Python code is provided for loading pre-trained models and applying the diarization pipeline. Training and pruning recipes are available in
recipes/
.
Highlighted Details
- Achieves state-of-the-art performance on various datasets, with Diarization Error Rates (DER) as low as 9.2% on VoxConverse and 9.8% on AISHELL-4 using the DiariZen-Large-s80 model.
- Models are trained on a compound dataset and then optimized using structured pruning.
- Supports saving RTTM results directly from the pipeline.
Maintenance & Community
- Recent updates (June 3, 2025) include structured pruning recipes, new pre-trained models, and updated benchmark results.
- Contact: ihan@fit.vut.cz
Licensing & Compatibility
- Licensed under the MIT license, permitting commercial use and integration with closed-source projects.
Limitations & Caveats
- The provided installation instructions require specific versions of PyTorch and CUDA, which might be a barrier for users with different hardware or software configurations.
- Benchmark results are evaluated without applying a collar and without domain adaptation, which may not reflect real-world performance in all scenarios.