Discover and explore top open-source AI tools and projects—updated daily.
Speech representation learning for diverse tasks
Top 67.6% on SourcePulse
PASE (Problem Agnostic Speech Encoder) and PASE+ are self-supervised speech waveform encoders designed for feature extraction and pre-training. They are suitable for tasks like Automatic Speech Recognition (ASR), speaker recognition, emotion recognition, voice conversion, and Text-to-Speech (TTS). The primary benefit is their ability to learn robust speech representations applicable across diverse speech processing tasks without task-specific supervision.
How It Works
PASE models are trained using a worker/minion framework in a self-supervised manner. The core idea is to train an encoder (PASE) to predict the outputs of multiple "worker" networks, each trained on a different self-supervised task. This multi-task learning approach allows the encoder to capture a wide range of speech characteristics, leading to more generalizable representations. The PASE+ variant further enhances this by incorporating more sophisticated data augmentation techniques and training strategies.
Quick Start & Requirements
pip install -r requirements.txt
followed by python setup.py install
.codec2
must be built from source, and pycodec2
installed (pip install pycodec2
). Ensure LD_LIBRARY_PATH
is set correctly if pycodec2
loading fails. CUDA version compatibility might require editing cupy-cuda100
in requirements.txt
.FE_e199.ckpt
) and configuration file (cfg/frontend/PASE+.cfg
) are available for direct use as a PyTorch nn.Module
.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
codec2
require building from source, which can add complexity to the setup.2 years ago
Inactive