speech-recognition-papers by wenet-e2e

Collection of speech recognition research papers

Created 5 years ago

330 stars

Top 82.6% on SourcePulse

Project Summary

This repository serves as a curated list of research papers focusing on cutting-edge, industrial-grade end-to-end speech recognition techniques. It targets researchers and engineers in the ASR field, providing a structured overview of advancements in streaming, non-autoregressive, and on-device ASR, as well as related areas like rescoring and self-supervised learning.

How It Works

The repository categorizes papers by key ASR research directions, such as Streaming ASR (RNA, RNN-T, Attention-based), Non-Autoregressive (NAR) ASR, ASR Rescoring, On-device ASR, and Self-Supervised Learning (SSL). Within each category, it lists influential papers, often highlighting specific architectural variations or training methodologies (e.g., Conformer-equipped RNN-T, Mask CTC, wav2vec 2.0). This structured approach allows users to quickly identify and explore relevant state-of-the-art approaches.

Highlighted Details

Comprehensive coverage of Streaming ASR, detailing various architectures like RNA, RNN-T, and Attention-based models, including Transformer and Conformer variants.
Extensive listing of Non-Autoregressive (NAR) ASR techniques, such as MASK-Predict, Imputer, and Insertion-based methods, with a focus on recent advancements.
Inclusion of papers on ASR Rescoring/Spelling Correction, On-device ASR, and Self-Supervised Learning (SSL) methods like APC and CPC.
Categorization of unified streaming/non-streaming models and multi-speaker ASR approaches.

Maintenance and Community

This is a community-driven list, with an open invitation for pull requests to add new papers or corrections. Specific contributors or maintainers are not highlighted in the README.

Licensing and Compatibility

The repository itself does not contain code, only a list of papers. Therefore, no specific software license or compatibility restrictions apply to the repository's content.

Limitations and Caveats

This repository is a reference list of papers and does not provide implementations, code, or benchmarks. Users must consult the individual papers for technical details, code availability, and performance evaluations.

speech-recognition-papers by wenet-e2e

Explore Similar Projects

MiMo-V2.5-ASR by XiaomiMiMo

deepspeech-german by AASHISHAG

WenetSpeech by wenet-e2e

GigaSpeech by SpeechColab

Mega-ASR by xzf-thu

UniSpeech by microsoft

ASR_Theory by zw76859420

kospeech by sooftware

athena by athena-team

vall-e by lifeiteng

awesome-speech-recognition-speech-synthesis-papers by zzw922cn

FunASR by modelscope