EEND  by hitachi-speech

Speaker diarization research paper using end-to-end neural networks

Created 6 years ago
406 stars

Top 71.7% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides an End-to-End Neural Diarization (EEND) system, a neural-network-based approach to speaker diarization. It is designed for researchers and practitioners in speech processing who need a flexible framework for speaker diarization tasks, offering implementations of BLSTM and self-attentive models, including extensions for an unknown number of speakers.

How It Works

The EEND system utilizes a neural network to directly predict speaker activity segments without relying on traditional clustering methods. It employs a permutation-free objective function to handle the inherent ambiguity in speaker assignment. The self-attentive models incorporate attention mechanisms to better capture long-range dependencies in speech, potentially improving accuracy, especially in complex scenarios.

Quick Start & Requirements

  • Install: Clone the repository and run cd tools && make to build Kaldi and set up the environment.
  • Prerequisites: NVIDIA GPU with CUDA Toolkit (version 8.0 to 10.1), Python environment.
  • Setup: Building Kaldi and installing dependencies can take a significant amount of time.
  • Docs: Kaldi Queue Documentation

Highlighted Details

  • Implements BLSTM and self-attentive EEND models.
  • Supports diarization for an unknown number of speakers using encoder-decoder based attractors.
  • Includes recipes for mini_librispeech and CALLHOME datasets.

Maintenance & Community

The project is associated with Hitachi Speech. No specific community channels or active development signals are immediately apparent from the README.

Licensing & Compatibility

The repository does not explicitly state a license. Users should verify licensing for commercial use or integration into closed-source projects.

Limitations & Caveats

The CUDA Toolkit version requirement (8.0 <= version <= 10.1) is quite restrictive and may not be compatible with modern NVIDIA drivers or GPUs. The setup process, particularly building Kaldi, is complex and time-consuming.

Health Check
Last Commit

4 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 30 days

Explore Similar Projects

Starred by Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake).

awesome-diarization by wq2012

0.2%
2k
List of resources for speaker diarization
Created 6 years ago
Updated 1 month ago
Starred by Tim J. Baek Tim J. Baek(Founder of Open WebUI), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
4 more.

StyleTTS2 by yl4579

0.2%
6k
Text-to-speech model achieving human-level synthesis
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.