zeroth  by goodatlas

Korean ASR toolkit

Created 7 years ago
357 stars

Top 78.2% on SourcePulse

GitHubView on GitHub
Project Summary

Zeroth is an open-source project providing a Kaldi-based Automatic Speech Recognition (ASR) system specifically for the Korean language. It aims to make Korean ASR technology more accessible for developers and researchers, serving as a foundational piece for building new speech-enabled products and services.

How It Works

Zeroth leverages the Kaldi ASR toolkit, incorporating advanced acoustic modeling techniques such as TDNN (with Factorization), TDNN + LSTM, and TDNN + OPGRU Chain models. It also utilizes data augmentation for reverberant speech. The language model and phonetic dictionary are built using a data-driven approach, with ongoing contributions from a crowdsourced audio database.

Quick Start & Requirements

  • Installation: Refer to the project's Requirements wiki for package details. Additional packages for language model and phonetic dictionary execution are detailed in Requirements-2.
  • Data: 51.6 hours of transcribed Korean audio and LM data are available at OpenSLR. Users can contribute to the audio database via the MoreCoin app (Android/iOS).
  • Customization: Instructions for creating custom language models and phonetic dictionaries are available at s5/data/local/lm/README.md.

Highlighted Details

  • Features advanced acoustic models including TDNN, TDNN+LSTM, and TDNN+OPGRU.
  • Utilizes a data-driven approach for language models and phonetic dictionaries.
  • Includes a substantial corpus with over 121 million sentences and a large phonetic dictionary.
  • Offers perplexity scores for 3-gram (221.2969) and 4-gram (187.2058) language models.

Maintenance & Community

Licensing & Compatibility

  • License: Apache 2.0.
  • Audio Data License: CC BY 4.0.
  • Compatible with commercial use under the Apache 2.0 license.

Limitations & Caveats

The project's data collection and model training appear to be based on data up to early 2018, and the README does not specify the latest update or ongoing development status.

Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 30 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), Benjamin Bolte Benjamin Bolte(Cofounder of K-Scale Labs), and
3 more.

espnet by espnet

0.2%
9k
End-to-end speech processing toolkit for various speech tasks
Created 7 years ago
Updated 3 days ago
Feedback? Help us improve.