Discover and explore top open-source AI tools and projects—updated daily.
Toolkit for speaker verification, recognition, and diarization
Top 19.2% on SourcePulse
3D-Speaker is an open-source toolkit addressing single- and multi-modal speaker verification, recognition, and diarization. It provides pretrained models and a large-scale dataset (3D-Speaker-Dataset) to facilitate research in speech representation disentanglement. The toolkit is beneficial for researchers and developers working on speaker-related audio processing tasks.
How It Works
The toolkit supports various speaker verification models, including Res2Net, ResNet34, ECAPA-TDNN, ERes2Net, ERes2NetV2, and CAM++. It also offers recipes for self-supervised learning approaches like SDPN and RDINO. For speaker diarization, it includes a pipeline with modules for voice activity detection, speech segmentation, speaker embedding extraction, and speaker clustering, with optional overlap detection and multimodal fusion capabilities.
Quick Start & Requirements
conda create -n 3D-Speaker python=3.8
), activate it (conda activate 3D-Speaker
), and install requirements (pip install -r requirements.txt
).bash run.sh
in egs/3dspeaker/sv-eres2netv2/
) and speaker diarization (bash run_audio.sh
or bash run_video.sh
in egs/3dspeaker/speaker-diarization/
).python speakerlab/bin/infer_sv.py --model_id $model_id
).Highlighted Details
Maintenance & Community
The project is actively updated, with recent additions including diarization recipes, new pretrained models (ERes2NetV2, SDPN), and multimodal/semantic modules. Contact information via email is provided for inquiries.
Licensing & Compatibility
3D-Speaker is released under the Apache License 2.0. This license is permissive and generally compatible with commercial use and closed-source linking.
Limitations & Caveats
The README does not specify hardware requirements (e.g., GPU, CUDA versions) for training or inference, which may be significant for large models. While pretrained models are available, the setup time and resource footprint for training custom models are not detailed.
1 month ago
Inactive