MASR  by yeyupiaoling

PyTorch ASR framework for streaming and non-streaming speech recognition

Created 5 years ago
704 stars

Top 48.5% on SourcePulse

GitHubView on GitHub
Project Summary

MASR (Magical Automatic Speech Recognition) is a PyTorch-based framework for both streaming and non-streaming automatic speech recognition (ASR). It supports various models like Conformer and DeepSpeech2, multiple decoding methods, and extensive data augmentation, making it a versatile tool for researchers and developers working with speech recognition tasks. The framework aims for simplicity and practicality, with deployment options for servers and Nvidia Jetson devices.

How It Works

MASR leverages PyTorch for its core implementation, offering flexibility in model architecture and training. It supports multiple pre-processing techniques (e.g., fbank, mfcc) and a variety of data augmentation methods to improve model robustness. The framework's key advantage lies in its unified support for both streaming and non-streaming ASR through a simple configuration parameter, along with diverse decoding strategies like CTC greedy search, CTC prefix beam search, and attention rescoring.

Quick Start & Requirements

  • Installation: Requires Anaconda 3, Python 3.11, and PyTorch 2.5.1.
  • OS: Windows 11 or Ubuntu 22.04.
  • Resources: Pre-trained models and resources are available via a "knowledge planet" or QQ group.
  • Documentation: Links to quick start, usage, data preparation, training, decoding, and deployment are provided.

Highlighted Details

  • Supports Conformer, Squeezeformer, DeepSpeech2, and Efficient_Conformer models, all with streaming and non-streaming capabilities.
  • Offers advanced data augmentation, including noise, reverberation, speed, volume, resampling, displacement, and SpecAugmentors.
  • Provides multiple inference methods: short audio, long audio, streaming, and speaker-separated ASR.
  • V3 is not compatible with V2, with significant project structure optimizations for ease of use.

Maintenance & Community

  • The project is actively developed, with V3 being the latest release.
  • Community discussion is encouraged via QQ groups and a "knowledge planet."
  • Related projects include voiceprint recognition and audio classification frameworks.

Licensing & Compatibility

  • The license is not explicitly stated in the provided text, but it is a PyTorch-based project. Compatibility for commercial use or closed-source linking would require clarification.

Limitations & Caveats

  • Pre-trained model weights and some resources are only accessible through paid channels ("knowledge planet" or QQ group), which may be a barrier for some users.
  • The README mentions that models trained on larger datasets have lower accuracy due to fewer training epochs, but are suitable for fine-tuning.
Health Check
Last Commit

3 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
12 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.