DiariZen by BUTSpeechFIT

Speaker diarization toolkit

Created 1 year ago

367 stars

Top 76.9% on SourcePulse

Project Summary

DiariZen is a speaker diarization toolkit designed for researchers and practitioners in speech processing. It offers a streamlined approach to identifying and segmenting speech by different speakers in audio recordings, leveraging advanced models from AudioZen and Pyannote.

How It Works

DiariZen utilizes pre-trained models based on WavLM, a self-supervised learning model, and incorporates structured pruning techniques to create compact yet accurate diarization systems. This approach aims to improve efficiency and generalization by reducing model redundancy without significant performance degradation, as demonstrated by its competitive benchmark results.

Quick Start & Requirements

Installation: Requires creating a Conda environment with Python 3.10, installing PyTorch 2.1.1 with CUDA 12.1 support, and then installing DiariZen and its dependencies via pip install -r requirements.txt && pip install -e .. Pyannote-audio also needs to be installed with its development and testing extras. Git submodules for dscore must be initialized and updated.
Prerequisites: CUDA 12.1, Python 3.10, PyTorch 2.1.1, PyTorch-CUDA 12.1, PyTorch-Vision 0.16.1, PyTorch-Audio 2.1.1.
Usage: Inference is supported via Hugging Face Transformers. Example Python code is provided for loading pre-trained models and applying the diarization pipeline. Training and pruning recipes are available in recipes/.

Highlighted Details

Achieves state-of-the-art performance on various datasets, with Diarization Error Rates (DER) as low as 9.2% on VoxConverse and 9.8% on AISHELL-4 using the DiariZen-Large-s80 model.
Models are trained on a compound dataset and then optimized using structured pruning.
Supports saving RTTM results directly from the pipeline.

Maintenance & Community

Recent updates (June 3, 2025) include structured pruning recipes, new pre-trained models, and updated benchmark results.
Contact: ihan@fit.vut.cz

Licensing & Compatibility

Licensed under the MIT license, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The provided installation instructions require specific versions of PyTorch and CUDA, which might be a barrier for users with different hardware or software configurations.
Benchmark results are evaluated without applying a collar and without domain adaptation, which may not reflect real-world performance in all scenarios.

Health Check

Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)

1

Issues (30d)

1

Star History

26 stars in the last 30 days

Explore Similar Projects

Awesome-Speaker-Diarization by DongKeon

Collection of speaker diarization papers

Created 2 years ago

Updated 7 months ago

AudioBench by AudioLLMs

A universal benchmark for evaluating audio large language models

Created 1 year ago

Updated 6 months ago

StyleSpeech by KevinMIN95

Multi-speaker adaptive TTS generation

Created 4 years ago

Updated 3 years ago

OLMoASR by allenai

Open-source speech recognition models

Created 2 years ago

Updated 2 months ago

Starred by

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind) and

Patrick von Platen

Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral).

huggingsound by jonatasgrosman

Speech toolkit for speech-related tasks based on Hugging Face's tools

Created 3 years ago

Updated 2 years ago

Starred by

Patrick von Platen

Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral).

UniSpeech by microsoft

Speech models for self-supervised learning

Created 4 years ago

Updated 1 year ago

asv-subtools by Snowdar

PyTorch/Kaldi toolkit for speaker recognition and language ID research

Created 5 years ago

Updated 1 year ago

diart by juanmc2005

Real-time audio applications framework

Created 4 years ago

Updated 11 months ago

wespeaker by wenet-e2e

Speaker toolkit for verification, recognition, and diarization research

Created 4 years ago

Updated 1 week ago

Starred by

Stas Bekman

Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake).

awesome-diarization by wq2012

List of resources for speaker diarization

Created 7 years ago

Updated 5 months ago

3D-Speaker by modelscope

Toolkit for speaker verification, recognition, and diarization

Created 2 years ago

Updated 1 month ago

Starred by

Tim J. Baek

Tim J. Baek(Founder of Open WebUI),

Luis Capelo

Luis Capelo(Cofounder of Lightning AI), and

5 more.

pyannote-audio by pyannote

Speaker diarization toolkit

Created 9 years ago

Updated 3 days ago

Feedback? Help us improve.