eesen by srvk

ASR research project simplifying speech-to-text

Created 10 years ago

833 stars

Top 42.5% on SourcePulse

View on GitHub

1 Expert Loves This Project

Andreas Jansson

Cofounder of Replicate

Project Summary

Eesen offers an end-to-end Automatic Speech Recognition (ASR) system that simplifies the traditional pipeline by framing it as a sequence learning problem. It targets researchers and developers seeking a more streamlined approach to ASR, leveraging deep recurrent neural networks and connectionist temporal classification for acoustic modeling and training.

How It Works

Eesen utilizes bi-directional RNNs with LSTM units for acoustic modeling and Connectionist Temporal Classification (CTC) as the training objective. It offers two decoding approaches: WFST-based decoding, which integrates lexicons and language models efficiently, and RNN-LM decoding, which bypasses the need for a fixed lexicon. This approach eliminates the need for HMMs, GMMs, decision trees, and explicit dictionaries, simplifying the ASR pipeline.

Quick Start & Requirements

Installation: Primarily through Kaldi recipes and conventions.
Prerequisites: GPU implementation for LSTM model training and CTC learning is available, with Tensorflow support.
Resources: Multiple utterances are processed in parallel for training speed-up.
Documentation: Refer to the paper "EESEN: End-to-End Speech Recognition using Deep RNN Models and WFST-based Decoding" for detailed information.

Highlighted Details

WFST-based decoding effectively incorporates lexicons and language models into CTC decoding.
RNN-LM decoding offers flexibility by not requiring a fixed lexicon.
GPU implementation of LSTM and CTC training is available, with Tensorflow support.
Provides example setups for phoneme and character-based ASR, following Kaldi conventions.

Maintenance & Community

The project was created by Yajie Miao, with inspiration from the Kaldi toolkit. Further community or maintenance details are not explicitly provided in the README.

Licensing & Compatibility

The README does not specify a license. Compatibility for commercial use or closed-source linking is not detailed.

Limitations & Caveats

The README mentions a separate Tensorflow branch, suggesting potential divergence or ongoing development. Specific limitations regarding supported platforms, performance benchmarks, or known issues are not detailed.

Health Check

Last Commit

6 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days