Awesome-Korean-Speech-Recognition by rtzr

Korean STT API benchmark, datasets, and character error rates

Created 2 years ago

488 stars

Top 63.2% on SourcePulse

Project Summary

This repository curates Korean Speech Recognition (STT) APIs, providing performance benchmarks using Character Error Rate (CER) on public datasets. It targets developers and researchers evaluating STT solutions for Korean language applications, offering objective comparisons to aid in selection and development.

How It Works

The project evaluates several Korean STT APIs, including OpenAI Whisper, Google Cloud Speech-to-text v2, ETRI, Naver Clova Speech, and Return Zero's VITO Speech. Performance is measured using Character Error Rate (CER) against various AI-Hub datasets, which include conversational speech, call center recordings, and educational content. CER is used due to Korean's agglutinative nature and ambiguous word boundaries, making character-level accuracy a more robust metric than Word Error Rate (WER).

Quick Start & Requirements

Usage: The project primarily presents benchmark results. Direct API usage requires individual API key setup and adherence to each provider's documentation.
Data: Benchmarks are based on AI-Hub datasets. A sample of 3000 sentences per dataset was used for testing.
Resources: Evaluating APIs requires API keys and potentially costs associated with API calls.
Links:
- CER Calculation Visualization: https://cer-diff.vercel.app
- Dataset Information: Linked within the README for specific datasets.

Highlighted Details

Performance Benchmarks: Detailed CER results are provided for multiple APIs across diverse Korean speech datasets.
CER vs. WER: Explains the rationale for using CER for Korean, highlighting linguistic challenges with WER.
Data Normalization: Discusses the impact of varying data normalization practices across datasets on STT model performance.
API List: Includes APIs readily available for developers without extensive approval processes.

Maintenance & Community

Contributions: Contributions are welcomed via Issues and Pull Requests. Contact email: research@rtzr.ai.
Development: Return Zero has contributed to making KsponSpeech available in SpeechBrain and Hugging Face.

Licensing & Compatibility

License: The repository itself appears to be under an unspecified license, but it compiles and presents data from various services. The underlying datasets and APIs have their own licensing terms.
Commercial Use: Commercial use depends on the terms of each individual STT API provider listed.

Limitations & Caveats

Benchmarks are based on a sampled subset (3000 sentences) of datasets, which may not represent full dataset performance.
Google's API v2 had file size and duration limitations affecting some tests.
The list of APIs is not exhaustive, with some services like Amazon Transcribe and Microsoft Speech Service noted for future testing.

Health Check

Last Commit

6 months ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

9 stars in the last 30 days

Explore Similar Projects

VoiceBench by MatthewCYM

Benchmark for LLM-based voice assistants

Created 1 year ago

Updated 3 weeks ago

speech-recognition-uk by egorsmkv

Resource collection for Ukrainian speech AI

Created 5 years ago

Updated 5 months ago

Starred by

Tobi Lutke

Tobi Lutke(Cofounder of Shopify),

Jonathan Ragan-Kelley

Jonathan Ragan-Kelley(Professor at MIT), and

2 more.

tinydiarize by akashmjn

Finetuned speech model for speaker diarization

Created 3 years ago

Updated 2 years ago

Starred by

Maxime Labonne

Maxime Labonne(Head of Post-Training at Liquid AI).

dataspeech by huggingface

Suite of scripts for tagging speech datasets, especially for TTS model development

Created 2 years ago

Updated 1 year ago

Starred by

Patrick von Platen

Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral).

GigaSpeech by SpeechColab

Large dataset for speech recognition research

Created 5 years ago

Updated 2 years ago

open-tts-tracker by Vaibhavs10

Open TTS Tracker: resource for open-access TTS models

Created 2 years ago

Updated 1 year ago

speech-to-text-benchmark by Picovoice

STT benchmark framework for comparing speech-to-text engines

Created 7 years ago

Updated 1 month ago

forced-alignment-tools by pettarin

Audio forced alignment tools

Created 9 years ago

Updated 4 years ago

QuickAgent by gkamradt

Voice bot demo using speech and language models

Created 2 years ago

Updated 1 year ago

KoAlpaca by Beomi

Korean LLM fine-tuning project

Created 2 years ago

Updated 1 year ago

Starred by

Shawn Wang

Shawn Wang(Editor of Latent Space) and

Magnus Müller

Magnus Müller(Cofounder of Browser Use).

noScribe by kaixxx

GUI tool for local AI-powered audio transcription

Created 2 years ago

Updated 3 days ago

sherpa-onnx by k2-fsa

Speech toolkit for local, offline speech AI tasks via ONNX

Created 3 years ago

Updated 3 hours ago

Feedback? Help us improve.