kospeech  by sooftware

PyTorch library for end-to-end Korean ASR research

created 5 years ago
625 stars

Top 53.7% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides an open-source toolkit for end-to-end Korean Automatic Speech Recognition (ASR) research, leveraging PyTorch and the Hydra configuration framework. It aims to offer a modular and extensible platform for developing and comparing various ASR models, specifically addressing the lack of established preprocessing methods and baseline models for the Korean KsponSpeech corpus.

How It Works

KoSpeech implements several state-of-the-art end-to-end ASR architectures, including Deep Speech 2, Listen Attend Spell (LAS), RNN-Transducer, Speech Transformer, Jasper, and Conformer. These models process raw audio features and directly map them to text sequences, simplifying the traditional hybrid ASR pipeline. The use of Hydra allows for flexible and hierarchical configuration management, facilitating experimentation with different model components and training parameters.

Quick Start & Requirements

  • Installation: pip install -e . after cloning the repository.
  • Prerequisites: Python 3.7+, NumPy, PyTorch (version dependent on environment), Pandas, Matplotlib, librosa, torchaudio (0.6.0), tqdm, sentencepiece, warp-rnnt, hydra-core.
  • Dataset: Supports KsponSpeech and LibriSpeech datasets; preprocessing instructions are provided.
  • Training: Example commands for training various models (e.g., python ./bin/main.py model=ds2 train=ds2_train train.dataset_path=$DATASET_PATH).
  • Inference: python3 ./bin/inference.py --model_path $MODEL_PATH --audio_path $AUDIO_PATH --device $DEVICE.
  • Documentation: Docs

Highlighted Details

  • Implements 7 distinct E2E ASR architectures: Deep Speech 2, LAS, Joint CTC-Attention LAS, RNN-Transducer, Speech Transformer, Jasper, and Conformer.
  • Provides baseline models and preprocessing methods for the KsponSpeech corpus.
  • Utilizes Hydra for advanced configuration management of experiments.
  • Supports both greedy and beam search decoding.

Maintenance & Community

The repository is marked as archived. The authors recommend alternative projects for active development or immediate use: OpenSpeech for training and internal code study, and Pororo ASR or Whisper for immediate testing of trained Korean ASR models. The last update was in May 2021. Community discussion is available via Gitter.

Licensing & Compatibility

Licensed under Apache-2.0. This license is permissive and generally compatible with commercial use and closed-source linking.

Limitations & Caveats

The project is archived, indicating no ongoing development or support. Subword and Grapheme unit models were not tested. The authors noted potential issues due to recent code modifications and personal busyness, encouraging feedback.

Health Check
Last commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
7 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.