awesome-diarization by wq2012

List of resources for speaker diarization

Created 7 years ago

1,834 stars

Top 23.4% on SourcePulse

1 Expert Loves This Project

stas00

Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake

Project Summary

This repository is a curated list of resources for speaker diarization, covering papers, software, datasets, and learning materials. It aims to organize and make accessible the world's speaker diarization knowledge for researchers and practitioners.

How It Works

The list is organized into categories such as publications, software frameworks, evaluation metrics, clustering algorithms, speaker embedding methods, and datasets. It provides links to relevant GitHub repositories, papers, and other resources, facilitating discovery and adoption of state-of-the-art techniques and tools.

Quick Start & Requirements

Primary install / run command: Varies by linked software; many are Python-based (e.g., pip install pyannote-audio).
Prerequisites: Python, PyTorch, TensorFlow, Kaldi, MATLAB, Java, C++ depending on the specific tool. Some require specific CUDA versions for GPU acceleration.
Resources: Datasets can be large (tens to hundreds of GBs). Training models can require significant GPU resources.
Links: pyannote.audio, SpeechBrain, FunASR.

Highlighted Details

Comprehensive coverage of recent advancements, including LLM-based diarization and end-to-end neural approaches.
Detailed tables of software frameworks, evaluation metrics, and speaker embedding methods with language and framework information.
Extensive lists of datasets, including pricing and descriptions, for both diarization and speaker embedding training.
Includes resources for audio feature extraction, data augmentation, and speaker change detection.

Maintenance & Community

The repository is community-driven, accepting contributions via pull requests.
No specific maintainer or community links (Discord/Slack) are listed in the README.

Licensing & Compatibility

Licenses vary significantly across linked projects, ranging from permissive (MIT, Apache) to more restrictive ones. Users must check individual project licenses.
Compatibility for commercial use depends on the specific software and dataset licenses.

Limitations & Caveats

This is a curated list, not a runnable software package itself. Users must integrate and manage individual components.
Some older entries may reference outdated techniques or software versions.
The "awesome" nature implies a subjective curation, and not all listed items may be actively maintained.

Health Check

Last Commit

5 months ago

Responsiveness

1 day

Pull Requests (30d)

0

Issues (30d)

0

Star History

10 stars in the last 30 days

Explore Similar Projects

Awesome-Speaker-Diarization by DongKeon

Collection of speaker diarization papers

Created 2 years ago

Updated 7 months ago

speech-recognition-uk by egorsmkv

Resource collection for Ukrainian speech AI

Created 5 years ago

Updated 4 months ago

speech-dataset-generator by davidmartinrius

Generate speech datasets from audio or URLs

Created 1 year ago

Updated 1 year ago

reverb by revdotcom

Open-source inference code for speech recognition and diarization models

Created 1 year ago

Updated 8 months ago

awesome-large-audio-models by EmulationAI

Curated list of Large Language Models in Audio AI

Created 2 years ago

Updated 2 months ago

Starred by

Eugene Yan

Eugene Yan(AI Scientist at AWS).

Whisper-transcription_and_diarization-speaker-identification- by lablab-ai

Audio transcription/diarization using Whisper and pyannote-audio

Created 3 years ago

Updated 3 years ago

EEND by hitachi-speech

Speaker diarization research paper using end-to-end neural networks

Created 6 years ago

Updated 4 years ago

Starred by

Patrick von Platen

Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral).

UniSpeech by microsoft

Speech models for self-supervised learning

Created 4 years ago

Updated 1 year ago

Starred by

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind).

whisper-plus by kadirnar

Speech-to-text toolkit for enhanced audio processing

Created 2 years ago

Updated 1 month ago

wespeaker by wenet-e2e

Speaker toolkit for verification, recognition, and diarization research

Created 4 years ago

Updated 1 week ago

3D-Speaker by modelscope

Toolkit for speaker verification, recognition, and diarization

Created 2 years ago

Updated 1 month ago

Starred by

Tim J. Baek

Tim J. Baek(Founder of Open WebUI),

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and

4 more.

StyleTTS2 by yl4579

Text-to-speech model achieving human-level synthesis

Created 2 years ago

Updated 1 year ago

Feedback? Help us improve.