Discover and explore top open-source AI tools and projects—updated daily.
santi-pdpSpeech representation learning for diverse tasks
Top 67.4% on SourcePulse
PASE (Problem Agnostic Speech Encoder) and PASE+ are self-supervised speech waveform encoders designed for feature extraction and pre-training. They are suitable for tasks like Automatic Speech Recognition (ASR), speaker recognition, emotion recognition, voice conversion, and Text-to-Speech (TTS). The primary benefit is their ability to learn robust speech representations applicable across diverse speech processing tasks without task-specific supervision.
How It Works
PASE models are trained using a worker/minion framework in a self-supervised manner. The core idea is to train an encoder (PASE) to predict the outputs of multiple "worker" networks, each trained on a different self-supervised task. This multi-task learning approach allows the encoder to capture a wide range of speech characteristics, leading to more generalizable representations. The PASE+ variant further enhances this by incorporating more sophisticated data augmentation techniques and training strategies.
Quick Start & Requirements
pip install -r requirements.txt followed by python setup.py install.codec2 must be built from source, and pycodec2 installed (pip install pycodec2). Ensure LD_LIBRARY_PATH is set correctly if pycodec2 loading fails. CUDA version compatibility might require editing cupy-cuda100 in requirements.txt.FE_e199.ckpt) and configuration file (cfg/frontend/PASE+.cfg) are available for direct use as a PyTorch nn.Module.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
codec2 require building from source, which can add complexity to the setup.2 years ago
Inactive
espnet
RVC-Boss