Awesome-Speaker-Diarization by DongKeon

Collection of speaker diarization papers

Created 3 years ago

366 stars

Top 76.8% on SourcePulse

Project Summary

This repository is a curated collection of academic papers on speaker diarization, a field focused on identifying "who spoke when" in audio recordings. It serves researchers and practitioners by providing a comprehensive overview of state-of-the-art techniques, datasets, and challenges in the domain.

How It Works

The repository organizes papers by methodology, including End-to-End Neural Diarization (EEND), clustering-based approaches, and methods incorporating speaker embeddings. It also categorizes papers by specific applications and challenges, such as multi-channel audio, online diarization, and integration with Automatic Speech Recognition (ASR). This structured approach allows users to easily navigate and discover relevant research.

Quick Start & Requirements

This is a curated list of papers, not a software library. No installation or specific requirements are needed beyond a web browser to access the linked papers.

Highlighted Details

Extensive coverage of End-to-End Neural Diarization (EEND) techniques, including BLSTM-EEND, SA-EEND, EEND-EDA, CB-EEND, and Transformer-based models.
Detailed sections on related tasks like speaker recognition, speech separation, and language diarization.
Comprehensive lists of papers from major challenges such as VoxSRC, DIHARD, and MISP, often including winning system descriptions.
Inclusion of papers on novel approaches like multimodal diarization (audio-visual) and LLM-based post-processing.

Maintenance & Community

The repository is maintained by DongKeon and welcomes contributions via issues or pull requests for unnoticed documents. It links to other relevant "awesome" lists for speaker diarization.

Licensing & Compatibility

This repository contains links to academic papers. The licensing and compatibility of the individual papers are determined by their respective publishers and authors.

Limitations & Caveats

This repository is a bibliography and does not provide code or implementations. Users must access the linked papers independently, and availability may depend on publisher subscriptions or open access status.

Awesome-Speaker-Diarization by DongKeon

Explore Similar Projects

MOSS-Transcribe-Diarize by OpenMOSS

SoulX-Transcriber by Soul-AILab

speech-recognition-uk by egorsmkv

reverb by revdotcom

INTERSPEECH-2023-24-Papers by DmitryRyumin

EEND by hitachi-speech

UniSpeech by microsoft

whisper-plus by kadirnar

wespeaker by wenet-e2e

awesome-diarization by wq2012

3D-Speaker by modelscope

StyleTTS2 by yl4579