Speaker diarization research paper using end-to-end neural networks
Top 72.9% on sourcepulse
This repository provides an End-to-End Neural Diarization (EEND) system, a neural-network-based approach to speaker diarization. It is designed for researchers and practitioners in speech processing who need a flexible framework for speaker diarization tasks, offering implementations of BLSTM and self-attentive models, including extensions for an unknown number of speakers.
How It Works
The EEND system utilizes a neural network to directly predict speaker activity segments without relying on traditional clustering methods. It employs a permutation-free objective function to handle the inherent ambiguity in speaker assignment. The self-attentive models incorporate attention mechanisms to better capture long-range dependencies in speech, potentially improving accuracy, especially in complex scenarios.
Quick Start & Requirements
cd tools && make
to build Kaldi and set up the environment.Highlighted Details
Maintenance & Community
The project is associated with Hitachi Speech. No specific community channels or active development signals are immediately apparent from the README.
Licensing & Compatibility
The repository does not explicitly state a license. Users should verify licensing for commercial use or integration into closed-source projects.
Limitations & Caveats
The CUDA Toolkit version requirement (8.0 <= version <= 10.1) is quite restrictive and may not be compatible with modern NVIDIA drivers or GPUs. The setup process, particularly building Kaldi, is complex and time-consuming.
3 years ago
Inactive