eesen  by srvk

ASR research project simplifying speech-to-text

Created 10 years ago
832 stars

Top 42.7% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Eesen offers an end-to-end Automatic Speech Recognition (ASR) system that simplifies the traditional pipeline by framing it as a sequence learning problem. It targets researchers and developers seeking a more streamlined approach to ASR, leveraging deep recurrent neural networks and connectionist temporal classification for acoustic modeling and training.

How It Works

Eesen utilizes bi-directional RNNs with LSTM units for acoustic modeling and Connectionist Temporal Classification (CTC) as the training objective. It offers two decoding approaches: WFST-based decoding, which integrates lexicons and language models efficiently, and RNN-LM decoding, which bypasses the need for a fixed lexicon. This approach eliminates the need for HMMs, GMMs, decision trees, and explicit dictionaries, simplifying the ASR pipeline.

Quick Start & Requirements

  • Installation: Primarily through Kaldi recipes and conventions.
  • Prerequisites: GPU implementation for LSTM model training and CTC learning is available, with Tensorflow support.
  • Resources: Multiple utterances are processed in parallel for training speed-up.
  • Documentation: Refer to the paper "EESEN: End-to-End Speech Recognition using Deep RNN Models and WFST-based Decoding" for detailed information.

Highlighted Details

  • WFST-based decoding effectively incorporates lexicons and language models into CTC decoding.
  • RNN-LM decoding offers flexibility by not requiring a fixed lexicon.
  • GPU implementation of LSTM and CTC training is available, with Tensorflow support.
  • Provides example setups for phoneme and character-based ASR, following Kaldi conventions.

Maintenance & Community

The project was created by Yajie Miao, with inspiration from the Kaldi toolkit. Further community or maintenance details are not explicitly provided in the README.

Licensing & Compatibility

The README does not specify a license. Compatibility for commercial use or closed-source linking is not detailed.

Limitations & Caveats

The README mentions a separate Tensorflow branch, suggesting potential divergence or ongoing development. Specific limitations regarding supported platforms, performance benchmarks, or known issues are not detailed.

Health Check
Last Commit

6 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 30 days

Explore Similar Projects

Starred by Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
2 more.

pyctcdecode by kensho-technologies

0%
460
CTC beam search decoder for speech recognition
Created 4 years ago
Updated 2 years ago
Feedback? Help us improve.