Awesome-Korean-Speech-Recognition  by rtzr

Korean STT API benchmark, datasets, and character error rates

created 1 year ago
426 stars

Top 70.5% on sourcepulse

GitHubView on GitHub
Project Summary

This repository curates Korean Speech Recognition (STT) APIs, providing performance benchmarks using Character Error Rate (CER) on public datasets. It targets developers and researchers evaluating STT solutions for Korean language applications, offering objective comparisons to aid in selection and development.

How It Works

The project evaluates several Korean STT APIs, including OpenAI Whisper, Google Cloud Speech-to-text v2, ETRI, Naver Clova Speech, and Return Zero's VITO Speech. Performance is measured using Character Error Rate (CER) against various AI-Hub datasets, which include conversational speech, call center recordings, and educational content. CER is used due to Korean's agglutinative nature and ambiguous word boundaries, making character-level accuracy a more robust metric than Word Error Rate (WER).

Quick Start & Requirements

  • Usage: The project primarily presents benchmark results. Direct API usage requires individual API key setup and adherence to each provider's documentation.
  • Data: Benchmarks are based on AI-Hub datasets. A sample of 3000 sentences per dataset was used for testing.
  • Resources: Evaluating APIs requires API keys and potentially costs associated with API calls.
  • Links:

Highlighted Details

  • Performance Benchmarks: Detailed CER results are provided for multiple APIs across diverse Korean speech datasets.
  • CER vs. WER: Explains the rationale for using CER for Korean, highlighting linguistic challenges with WER.
  • Data Normalization: Discusses the impact of varying data normalization practices across datasets on STT model performance.
  • API List: Includes APIs readily available for developers without extensive approval processes.

Maintenance & Community

  • Contributions: Contributions are welcomed via Issues and Pull Requests. Contact email: research@rtzr.ai.
  • Development: Return Zero has contributed to making KsponSpeech available in SpeechBrain and Hugging Face.

Licensing & Compatibility

  • License: The repository itself appears to be under an unspecified license, but it compiles and presents data from various services. The underlying datasets and APIs have their own licensing terms.
  • Commercial Use: Commercial use depends on the terms of each individual STT API provider listed.

Limitations & Caveats

  • Benchmarks are based on a sampled subset (3000 sentences) of datasets, which may not represent full dataset performance.
  • Google's API v2 had file size and duration limitations affecting some tests.
  • The list of APIs is not exhaustive, with some services like Amazon Transcribe and Microsoft Speech Service noted for future testing.
Health Check
Last commit

3 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
28 stars in the last 90 days

Explore Similar Projects

Starred by Boris Cherny Boris Cherny(Creator of Claude Code; MTS at Anthropic), Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), and
19 more.

whisper by openai

0.4%
86k
Speech recognition model for multilingual transcription/translation
created 2 years ago
updated 1 month ago
Feedback? Help us improve.