kospeech by sooftware

PyTorch library for end-to-end Korean ASR research

Created 6 years ago

634 stars

Top 52.3% on SourcePulse

Project Summary

This project provides an open-source toolkit for end-to-end Korean Automatic Speech Recognition (ASR) research, leveraging PyTorch and the Hydra configuration framework. It aims to offer a modular and extensible platform for developing and comparing various ASR models, specifically addressing the lack of established preprocessing methods and baseline models for the Korean KsponSpeech corpus.

How It Works

KoSpeech implements several state-of-the-art end-to-end ASR architectures, including Deep Speech 2, Listen Attend Spell (LAS), RNN-Transducer, Speech Transformer, Jasper, and Conformer. These models process raw audio features and directly map them to text sequences, simplifying the traditional hybrid ASR pipeline. The use of Hydra allows for flexible and hierarchical configuration management, facilitating experimentation with different model components and training parameters.

Quick Start & Requirements

Installation: pip install -e . after cloning the repository.
Prerequisites: Python 3.7+, NumPy, PyTorch (version dependent on environment), Pandas, Matplotlib, librosa, torchaudio (0.6.0), tqdm, sentencepiece, warp-rnnt, hydra-core.
Dataset: Supports KsponSpeech and LibriSpeech datasets; preprocessing instructions are provided.
Training: Example commands for training various models (e.g., python ./bin/main.py model=ds2 train=ds2_train train.dataset_path=$DATASET_PATH).
Inference: python3 ./bin/inference.py --model_path $MODEL_PATH --audio_path $AUDIO_PATH --device $DEVICE.
Documentation: Docs

Highlighted Details

Implements 7 distinct E2E ASR architectures: Deep Speech 2, LAS, Joint CTC-Attention LAS, RNN-Transducer, Speech Transformer, Jasper, and Conformer.
Provides baseline models and preprocessing methods for the KsponSpeech corpus.
Utilizes Hydra for advanced configuration management of experiments.
Supports both greedy and beam search decoding.

Maintenance & Community

The repository is marked as archived. The authors recommend alternative projects for active development or immediate use: OpenSpeech for training and internal code study, and Pororo ASR or Whisper for immediate testing of trained Korean ASR models. The last update was in May 2021. Community discussion is available via Gitter.

Licensing & Compatibility

Licensed under Apache-2.0. This license is permissive and generally compatible with commercial use and closed-source linking.

Limitations & Caveats

The project is archived, indicating no ongoing development or support. Subword and Grapheme unit models were not tested. The authors noted potential issues due to recent code modifications and personal busyness, encouraging feedback.

kospeech by sooftware

Explore Similar Projects

OSUM by ASLP-lab

deepspeech-german by AASHISHAG

dataspeech by huggingface

SLAM-LLM by X-LANCE

athena by athena-team

Bert-Multi-Label-Text-Classification by lonePatient

bert_language_understanding by brightmart

icefall by k2-fsa

parler-tts by huggingface

FunASR by modelscope

speechbrain by speechbrain

unilm by microsoft