ASR research project simplifying speech-to-text
Top 43.7% on sourcepulse
Eesen offers an end-to-end Automatic Speech Recognition (ASR) system that simplifies the traditional pipeline by framing it as a sequence learning problem. It targets researchers and developers seeking a more streamlined approach to ASR, leveraging deep recurrent neural networks and connectionist temporal classification for acoustic modeling and training.
How It Works
Eesen utilizes bi-directional RNNs with LSTM units for acoustic modeling and Connectionist Temporal Classification (CTC) as the training objective. It offers two decoding approaches: WFST-based decoding, which integrates lexicons and language models efficiently, and RNN-LM decoding, which bypasses the need for a fixed lexicon. This approach eliminates the need for HMMs, GMMs, decision trees, and explicit dictionaries, simplifying the ASR pipeline.
Quick Start & Requirements
Highlighted Details
Maintenance & Community
The project was created by Yajie Miao, with inspiration from the Kaldi toolkit. Further community or maintenance details are not explicitly provided in the README.
Licensing & Compatibility
The README does not specify a license. Compatibility for commercial use or closed-source linking is not detailed.
Limitations & Caveats
The README mentions a separate Tensorflow branch, suggesting potential divergence or ongoing development. Specific limitations regarding supported platforms, performance benchmarks, or known issues are not detailed.
6 years ago
Inactive