eesen  by srvk

ASR research project simplifying speech-to-text

created 10 years ago
829 stars

Top 43.7% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Eesen offers an end-to-end Automatic Speech Recognition (ASR) system that simplifies the traditional pipeline by framing it as a sequence learning problem. It targets researchers and developers seeking a more streamlined approach to ASR, leveraging deep recurrent neural networks and connectionist temporal classification for acoustic modeling and training.

How It Works

Eesen utilizes bi-directional RNNs with LSTM units for acoustic modeling and Connectionist Temporal Classification (CTC) as the training objective. It offers two decoding approaches: WFST-based decoding, which integrates lexicons and language models efficiently, and RNN-LM decoding, which bypasses the need for a fixed lexicon. This approach eliminates the need for HMMs, GMMs, decision trees, and explicit dictionaries, simplifying the ASR pipeline.

Quick Start & Requirements

  • Installation: Primarily through Kaldi recipes and conventions.
  • Prerequisites: GPU implementation for LSTM model training and CTC learning is available, with Tensorflow support.
  • Resources: Multiple utterances are processed in parallel for training speed-up.
  • Documentation: Refer to the paper "EESEN: End-to-End Speech Recognition using Deep RNN Models and WFST-based Decoding" for detailed information.

Highlighted Details

  • WFST-based decoding effectively incorporates lexicons and language models into CTC decoding.
  • RNN-LM decoding offers flexibility by not requiring a fixed lexicon.
  • GPU implementation of LSTM and CTC training is available, with Tensorflow support.
  • Provides example setups for phoneme and character-based ASR, following Kaldi conventions.

Maintenance & Community

The project was created by Yajie Miao, with inspiration from the Kaldi toolkit. Further community or maintenance details are not explicitly provided in the README.

Licensing & Compatibility

The README does not specify a license. Compatibility for commercial use or closed-source linking is not detailed.

Limitations & Caveats

The README mentions a separate Tensorflow branch, suggesting potential divergence or ongoing development. Specific limitations regarding supported platforms, performance benchmarks, or known issues are not detailed.

Health Check
Last commit

6 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 90 days

Explore Similar Projects

Starred by Boris Cherny Boris Cherny(Creator of Claude Code; MTS at Anthropic), Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), and
19 more.

whisper by openai

0.4%
86k
Speech recognition model for multilingual transcription/translation
created 2 years ago
updated 1 month ago
Feedback? Help us improve.