wer_are_we  by syhw

Tracking speech recognition state-of-the-art

Created 10 years ago
1,869 stars

Top 23.2% on SourcePulse

GitHubView on GitHub
Project Summary

This repository serves as a curated bibliography of state-of-the-art results in automatic speech recognition (ASR). It tracks Word Error Rates (WER) across various datasets and benchmarks, providing a valuable resource for researchers and practitioners to understand the evolving landscape of ASR performance and identify leading methodologies.

How It Works

The project compiles and presents WER data from published research papers, organized by benchmark datasets such as LibriSpeech, WSJ, Hub5'00, Fisher, TED-LIUM, CHiME6, and TIMIT. It details the specific models, training data, and augmentation techniques employed in each cited work, allowing for a comparative analysis of different ASR approaches.

Quick Start & Requirements

This repository is a reference and does not require installation or execution. Users can directly access and utilize the information presented in the README.

Highlighted Details

  • Comprehensive tracking of WER across multiple standard ASR benchmarks.
  • Detailed information on model architectures (e.g., Conformer, HuBERT, Deep Speech 2), training strategies, and data augmentation techniques.
  • Includes human performance benchmarks for context.
  • Covers a wide range of ASR research from 2009 to the present.

Maintenance & Community

The repository is maintained by "syhw" and is open to community contributions for corrections and additions, as indicated by the "Feel free to correct!" invitation.

Licensing & Compatibility

The licensing information is not explicitly stated in the provided README.

Limitations & Caveats

The README does not provide direct links to the cited papers or code implementations, requiring users to search for them independently. The data is presented as a bibliography and does not include executable code or pre-trained models. Some entries are marked with "TODO," indicating incomplete information or areas for future expansion.

Health Check
Last Commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 30 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), Benjamin Bolte Benjamin Bolte(Cofounder of K-Scale Labs), and
3 more.

espnet by espnet

0.2%
9k
End-to-end speech processing toolkit for various speech tasks
Created 7 years ago
Updated 3 days ago
Feedback? Help us improve.