Speech models for self-supervised learning
Top 66.0% on sourcepulse
This repository provides UniSpeech, a family of large-scale self-supervised learning models for speech processing, including WavLM, UniSpeech, and UniSpeech-SAT. It offers pre-trained models and evaluation results for tasks like automatic speech recognition, speaker verification, speech separation, and speaker diarization, targeting researchers and developers in the speech technology domain.
How It Works
The UniSpeech family leverages self-supervised learning on massive unlabeled and labeled speech datasets. Models like WavLM use a masked prediction objective on acoustic frames, while UniSpeech and UniSpeech-SAT incorporate unified pre-training for both self-supervised and supervised learning, with UniSpeech-SAT specifically focusing on speaker-aware pre-training to enhance performance on speaker-related tasks. This approach aims to create universal speech representations applicable across a wide range of downstream tasks.
Quick Start & Requirements
src
directory.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
src
directory and HuggingFace integration.1 year ago
Inactive