PyTorch library for end-to-end Korean ASR research
Top 53.7% on sourcepulse
This project provides an open-source toolkit for end-to-end Korean Automatic Speech Recognition (ASR) research, leveraging PyTorch and the Hydra configuration framework. It aims to offer a modular and extensible platform for developing and comparing various ASR models, specifically addressing the lack of established preprocessing methods and baseline models for the Korean KsponSpeech corpus.
How It Works
KoSpeech implements several state-of-the-art end-to-end ASR architectures, including Deep Speech 2, Listen Attend Spell (LAS), RNN-Transducer, Speech Transformer, Jasper, and Conformer. These models process raw audio features and directly map them to text sequences, simplifying the traditional hybrid ASR pipeline. The use of Hydra allows for flexible and hierarchical configuration management, facilitating experimentation with different model components and training parameters.
Quick Start & Requirements
pip install -e .
after cloning the repository.python ./bin/main.py model=ds2 train=ds2_train train.dataset_path=$DATASET_PATH
).python3 ./bin/inference.py --model_path $MODEL_PATH --audio_path $AUDIO_PATH --device $DEVICE
.Highlighted Details
Maintenance & Community
The repository is marked as archived. The authors recommend alternative projects for active development or immediate use: OpenSpeech for training and internal code study, and Pororo ASR or Whisper for immediate testing of trained Korean ASR models. The last update was in May 2021. Community discussion is available via Gitter.
Licensing & Compatibility
Licensed under Apache-2.0. This license is permissive and generally compatible with commercial use and closed-source linking.
Limitations & Caveats
The project is archived, indicating no ongoing development or support. Subword and Grapheme unit models were not tested. The authors noted potential issues due to recent code modifications and personal busyness, encouraging feedback.
2 years ago
Inactive