Discover and explore top open-source AI tools and projects—updated daily.
Speaker diarization using variational Bayes HMM over x-vectors
Top 94.1% on SourcePulse
This repository provides a speaker diarization recipe, VBx, which uses Variational Bayes Hidden Markov Models (HMM) over x-vectors. It's designed for researchers and practitioners working with speech processing tasks, particularly those involving speaker segmentation and identification in challenging audio datasets like CALLHOME, AMI, and DIHARD II. The primary benefit is an advanced diarization approach that leverages Bayesian methods for improved accuracy.
How It Works
VBx employs a multi-stage process: first, it computes x-vectors, which are fixed-dimensional speaker embeddings. Second, it performs agglomerative hierarchical clustering on these x-vectors to generate an initial speaker segmentation. Finally, it refines this segmentation by applying a Variational Bayes HMM over the x-vector sequences, offering a probabilistic approach to speaker diarization that can handle overlapping speech and varying speaker counts more robustly than traditional methods.
Quick Start & Requirements
conda create -n VBx python=3.9
), activate it (conda activate VBx
), clone the repository, install the package (pip install -e .
), and initialize the dscore
submodule (git submodule init && git submodule update
)../run_example.sh
.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
1 year ago
Inactive