Awesome-Speaker-Diarization  by DongKeon

Collection of speaker diarization papers

Created 2 years ago
306 stars

Top 87.5% on SourcePulse

GitHubView on GitHub
Project Summary

This repository is a curated collection of academic papers on speaker diarization, a field focused on identifying "who spoke when" in audio recordings. It serves researchers and practitioners by providing a comprehensive overview of state-of-the-art techniques, datasets, and challenges in the domain.

How It Works

The repository organizes papers by methodology, including End-to-End Neural Diarization (EEND), clustering-based approaches, and methods incorporating speaker embeddings. It also categorizes papers by specific applications and challenges, such as multi-channel audio, online diarization, and integration with Automatic Speech Recognition (ASR). This structured approach allows users to easily navigate and discover relevant research.

Quick Start & Requirements

This is a curated list of papers, not a software library. No installation or specific requirements are needed beyond a web browser to access the linked papers.

Highlighted Details

  • Extensive coverage of End-to-End Neural Diarization (EEND) techniques, including BLSTM-EEND, SA-EEND, EEND-EDA, CB-EEND, and Transformer-based models.
  • Detailed sections on related tasks like speaker recognition, speech separation, and language diarization.
  • Comprehensive lists of papers from major challenges such as VoxSRC, DIHARD, and MISP, often including winning system descriptions.
  • Inclusion of papers on novel approaches like multimodal diarization (audio-visual) and LLM-based post-processing.

Maintenance & Community

The repository is maintained by DongKeon and welcomes contributions via issues or pull requests for unnoticed documents. It links to other relevant "awesome" lists for speaker diarization.

Licensing & Compatibility

This repository contains links to academic papers. The licensing and compatibility of the individual papers are determined by their respective publishers and authors.

Limitations & Caveats

This repository is a bibliography and does not provide code or implementations. Users must access the linked papers independently, and availability may depend on publisher subscriptions or open access status.

Health Check
Last Commit

3 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 30 days

Explore Similar Projects

Starred by Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake).

awesome-diarization by wq2012

0.2%
2k
List of resources for speaker diarization
Created 6 years ago
Updated 1 month ago
Starred by Tim J. Baek Tim J. Baek(Founder of Open WebUI), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
4 more.

StyleTTS2 by yl4579

0.2%
6k
Text-to-speech model achieving human-level synthesis
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.