neural_sp  by hirofumi0810

End-to-end speech processing toolkit

Created 8 years ago
596 stars

Top 54.8% on SourcePulse

GitHubView on GitHub
Project Summary

NeuralSP is an end-to-end Automatic Speech Recognition (ASR) and Language Model (LM) toolkit implemented in PyTorch. It provides a comprehensive framework for speech processing tasks, supporting a wide range of modern neural network architectures and decoding strategies. The toolkit is designed for researchers and practitioners in speech technology who need a flexible and powerful system for building and experimenting with ASR and LM systems.

How It Works

NeuralSP leverages PyTorch for its neural network implementations, offering a variety of front-end feature extraction methods like frame stacking and SpecAugment. The encoder options include BLSTM, LGRU, Transformer, and Conformer architectures, with features like latency control and chunk hopping. Decoders are supported for Connectionist Temporal Classification (CTC), RNN-Transducer (RNN-T), and attention-based models, including various fusion techniques and streaming capabilities. The toolkit also supports recurrent, convolutional, and Transformer-based language models.

Quick Start & Requirements

Installation requires make and setting KALDI=/path/to/kaldi and TOOL=/path/to/save/tools. Specific dependencies include PyTorch and potentially CUDA for GPU acceleration. The README does not specify an estimated setup time or resource footprint.

Highlighted Details

  • Supports a diverse set of ASR corpora including AISHELL, AMI, CSJ, Librispeech, Switchboard, and TEDLIUM.
  • Implements advanced encoder architectures such as Conformer and Transformer with various enhancements.
  • Offers multiple decoder options including CTC, RNN-T, and attention-based decoders with streaming capabilities.
  • Provides benchmarks for various models across multiple datasets, demonstrating competitive performance.

Maintenance & Community

The project references Kaldi, ESPnet, and other ASR toolkits, suggesting a connection to established ASR research communities. Specific details on maintainers, community channels, or roadmap are not provided in the README.

Licensing & Compatibility

The README does not explicitly state the license type. It references other repositories, some of which have specific licenses (e.g., MIT, Apache 2.0), but the licensing for NeuralSP itself is unclear. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README lacks explicit details on limitations, such as unsupported features, known bugs, or alpha status. The installation instructions are minimal, and the absence of a clear license could be a barrier to adoption.

Health Check
Last Commit

4 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 30 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), Benjamin Bolte Benjamin Bolte(Cofounder of K-Scale Labs), and
3 more.

espnet by espnet

0.2%
9k
End-to-end speech processing toolkit for various speech tasks
Created 7 years ago
Updated 3 days ago
Feedback? Help us improve.