EEND by hitachi-speech

Speaker diarization research paper using end-to-end neural networks

Created 6 years ago

421 stars

Top 70.0% on SourcePulse

Project Summary

This repository provides an End-to-End Neural Diarization (EEND) system, a neural-network-based approach to speaker diarization. It is designed for researchers and practitioners in speech processing who need a flexible framework for speaker diarization tasks, offering implementations of BLSTM and self-attentive models, including extensions for an unknown number of speakers.

How It Works

The EEND system utilizes a neural network to directly predict speaker activity segments without relying on traditional clustering methods. It employs a permutation-free objective function to handle the inherent ambiguity in speaker assignment. The self-attentive models incorporate attention mechanisms to better capture long-range dependencies in speech, potentially improving accuracy, especially in complex scenarios.

Quick Start & Requirements

Install: Clone the repository and run cd tools && make to build Kaldi and set up the environment.
Prerequisites: NVIDIA GPU with CUDA Toolkit (version 8.0 to 10.1), Python environment.
Setup: Building Kaldi and installing dependencies can take a significant amount of time.
Docs: Kaldi Queue Documentation

Highlighted Details

Implements BLSTM and self-attentive EEND models.
Supports diarization for an unknown number of speakers using encoder-decoder based attractors.
Includes recipes for mini_librispeech and CALLHOME datasets.

Maintenance & Community

The project is associated with Hitachi Speech. No specific community channels or active development signals are immediately apparent from the README.

Licensing & Compatibility

The repository does not explicitly state a license. Users should verify licensing for commercial use or integration into closed-source projects.

Limitations & Caveats

The CUDA Toolkit version requirement (8.0 <= version <= 10.1) is quite restrictive and may not be compatible with modern NVIDIA drivers or GPUs. The setup process, particularly building Kaldi, is complex and time-consuming.

Health Check

Last Commit

4 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days