PyTorch/Kaldi toolkit for speaker recognition and language ID research
Top 54.1% on sourcepulse
ASV-Subtools is an open-source toolkit for speaker recognition and language identification, built on PyTorch and Kaldi. It offers a modular approach with numerous tools for feature extraction, model training, and backend scoring, catering to researchers and engineers in the speech processing domain. The toolkit aims to provide a flexible and efficient framework for developing and experimenting with various speaker recognition models.
How It Works
ASV-Subtools leverages Kaldi for acoustic feature extraction and backend scoring, while PyTorch is used for flexible model building and custom training. The project is structured into three main branches: basic shell scripts (Kaldi-based), Kaldi for core model training (i-vectors, x-vectors), and PyTorch for custom model development. PyTorch models must inherit from libs.nnet.framework.TopVirtualNnet
to access default functionalities like auto-saving and utterance embedding extraction.
Quick Start & Requirements
pip install -r subtools/requirements.txt
.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The README notes that Specaugment is not yet stable with multi-GPU training. Training large models on datasets like VoxCeleb2 can be time-consuming (1-2 days per model on 4 V100 GPUs). Some older scripts and results may have been removed.
1 year ago
Inactive