MASR by yeyupiaoling

PyTorch ASR framework for streaming and non-streaming speech recognition

Created 5 years ago

718 stars

Top 47.9% on SourcePulse

Project Summary

MASR (Magical Automatic Speech Recognition) is a PyTorch-based framework for both streaming and non-streaming automatic speech recognition (ASR). It supports various models like Conformer and DeepSpeech2, multiple decoding methods, and extensive data augmentation, making it a versatile tool for researchers and developers working with speech recognition tasks. The framework aims for simplicity and practicality, with deployment options for servers and Nvidia Jetson devices.

How It Works

MASR leverages PyTorch for its core implementation, offering flexibility in model architecture and training. It supports multiple pre-processing techniques (e.g., fbank, mfcc) and a variety of data augmentation methods to improve model robustness. The framework's key advantage lies in its unified support for both streaming and non-streaming ASR through a simple configuration parameter, along with diverse decoding strategies like CTC greedy search, CTC prefix beam search, and attention rescoring.

Quick Start & Requirements

Installation: Requires Anaconda 3, Python 3.11, and PyTorch 2.5.1.
OS: Windows 11 or Ubuntu 22.04.
Resources: Pre-trained models and resources are available via a "knowledge planet" or QQ group.
Documentation: Links to quick start, usage, data preparation, training, decoding, and deployment are provided.

Highlighted Details

Supports Conformer, Squeezeformer, DeepSpeech2, and Efficient_Conformer models, all with streaming and non-streaming capabilities.
Offers advanced data augmentation, including noise, reverberation, speed, volume, resampling, displacement, and SpecAugmentors.
Provides multiple inference methods: short audio, long audio, streaming, and speaker-separated ASR.
V3 is not compatible with V2, with significant project structure optimizations for ease of use.

Maintenance & Community

The project is actively developed, with V3 being the latest release.
Community discussion is encouraged via QQ groups and a "knowledge planet."
Related projects include voiceprint recognition and audio classification frameworks.

Licensing & Compatibility

The license is not explicitly stated in the provided text, but it is a PyTorch-based project. Compatibility for commercial use or closed-source linking would require clarification.

Limitations & Caveats

Pre-trained model weights and some resources are only accessible through paid channels ("knowledge planet" or QQ group), which may be a barrier for some users.
The README mentions that models trained on larger datasets have lower accuracy due to fewer training epochs, but are suitable for fine-tuning.

Health Check

Last Commit

3 weeks ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

7 stars in the last 30 days

Explore Similar Projects

Starred by

Patrick von Platen

Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral).

Squeezeformer by kssteven418

Speech recognition model based on an efficient Transformer architecture

Created 3 years ago

Updated 2 years ago

Starred by

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind).

fast-whisper-finetuning by Vaibhavs10

Finetuning walkthrough for Whisper ASR models

Created 2 years ago

Updated 1 year ago

SenseVoice.cpp by lovemefan

C/C++ port of an audio foundation model

Created 1 year ago

Updated 3 weeks ago

FastASR by chenkui164

C++ ASR inference project for ARM platforms

Created 3 years ago

Updated 2 years ago

awesome-kaldi by YoavRamon

List of Kaldi ASR resources

Created 7 years ago

Updated 3 years ago

TensorflowASR by Z-yq

ASR toolkit for CPU/edge deployment, approaching GPU model performance

Created 6 years ago

Updated 10 months ago

PPASR by yeyupiaoling

PaddlePaddle-based speech recognition framework

Created 4 years ago

Updated 3 weeks ago

neural_sp by hirofumi0810

End-to-end speech processing toolkit

Created 8 years ago

Updated 4 years ago

Starred by

Georgi Gerganov

Georgi Gerganov(Author of llama.cpp, whisper.cpp).

whisper.unity by Macoron

Unity3d bindings for local speech-to-text inference

Created 2 years ago

Updated 8 months ago

athena by athena-team

Open-source speech processing engine for industrial/academic use

Created 6 years ago

Updated 3 years ago

Starred by

Patrick von Platen

Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral).

icefall by k2-fsa

Speech-related recipes for various datasets using k2-fsa and lhotse

Created 4 years ago

Updated 1 month ago

Starred by

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"),

Piotr Dąbkowski

Piotr Dąbkowski(Cofounder of ElevenLabs), and

2 more.

PaddleSpeech by PaddlePaddle

Speech toolkit for ASR, TTS, speaker verification, translation, and keyword spotting

Created 8 years ago

Updated 2 months ago

Feedback? Help us improve.