Discover and explore top open-source AI tools and projects—updated daily.
Speech processing toolkit for end-to-end recognition
Top 80.6% on SourcePulse
PIKA is a lightweight, PyTorch-based speech processing toolkit that leverages (Py)Kaldi for efficient data handling and feature extraction, primarily focusing on end-to-end speech recognition. It is designed for researchers and developers working with speech data who need a flexible and performant toolkit. PIKA offers advanced features like on-the-fly data augmentation, various model architectures (TDNN, Transformer), and supports RNNT training with Minimum Bayes Risk (MBR) and external N-gram FSTs for rescoring.
How It Works
PIKA integrates PyTorch for its deep learning capabilities and Kaldi for robust data preparation and feature extraction. This hybrid approach allows for efficient data loading and augmentation directly within the training pipeline. The toolkit supports recurrent neural network transducer (RNNT) models, enabling end-to-end training and decoding. It also incorporates techniques like block model update filtering (BMUF) for distributed training and offers the flexibility to integrate Language Augmented Sequence (LAS) models for forward and backward rescoring of RNNT outputs, enhancing recognition accuracy.
Quick Start & Requirements
requirements.txt
.wav.scp
and label.txt
files. label.txt
maps utterance IDs to sequences of one-based indexed labels, with 0 reserved for blank symbols.egs
directory. Key scripts include egs/train_transducer_bmuf_otfaug.sh
for data preparation and RNNT training, egs/train_transducer_mbr_bmuf_otfaug.sh
for MBR training, and egs/train_las_rescorer_bmuf_otfaug.sh
for training LAS rescorers. Decoding is handled by egs/eval_transducer.sh
.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
Hyper-parameters are optimized for large-scale Mandarin speech data and may require significant tuning for other languages or datasets. The WER/CER scoring script is specific to Mandarin, necessitating modifications for other languages.
4 years ago
Inactive