PPASR by yeyupiaoling

PaddlePaddle-based speech recognition framework

Created 5 years ago

876 stars

Top 40.8% on SourcePulse

Project Summary

PPASR is an end-to-end Chinese automatic speech recognition (ASR) framework built on PaddlePaddle. It offers a simplified and practical approach to ASR, supporting popular models like DeepSpeech2, Conformer, and Squeezeformer, with both streaming and non-streaming capabilities. The project targets developers and researchers looking for a flexible and efficient ASR solution that can be deployed on servers and edge devices like Nvidia Jetson.

How It Works

PPASR (V3) is a significant overhaul from V2, focusing on ease of use and enhanced performance. It utilizes kaldi_native_fbank for faster, multi-platform compatible audio preprocessing and sentencepiece for tokenization, enabling easier handling of multiple languages and mixed-language training. The framework supports various decoding methods (e.g., ctc_greedy_search, ctc_prefix_beam_search, attention_rescoring) and data augmentation techniques for improved robustness.

Quick Start & Requirements

Installation: Requires Anaconda 3, Python 3.11, and PaddlePaddle 2.6.1.
OS: Windows 11 or Ubuntu 22.04.
Resources: Pre-trained models are available via a "knowledge planet" (details in README).
Docs: Links to online demos, WeChat mini-programs, and video tutorials are provided in the README.

Highlighted Details

Supports multiple ASR models including Conformer, Squeezeformer, and DeepSpeech2, all with streaming and non-streaming options.
Offers diverse decoding strategies and data augmentation methods for customized performance.
Pre-trained models are available for various datasets (WenetSpeech, AIShell, Librispeech) with reported character error rates (CER) or word error rates (WER). For example, a Conformer model with attention_rescoring on WenetSpeech achieves a CER of 0.13786 on test_net.

Maintenance & Community

The project is actively developed, with V3 being the latest release.
Community discussion is encouraged via QQ groups and a "knowledge planet" which also provides model files. Links to Bilibili for video explanations are included.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use is not specified.

Limitations & Caveats

Pre-trained model weights are not directly downloadable from the repository and require joining a paid "knowledge planet."
The project is primarily focused on Chinese ASR, though it mentions support for English and mixed-language training.

Health Check

Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

1 stars in the last 30 days

Explore Similar Projects

StableTTS by KdaiP

TTS model using flow-matching and DiT, inspired by Stable Diffusion 3

Created 1 year ago

Updated 1 year ago

FastASR by chenkui164

C++ ASR inference project for ARM platforms

Created 3 years ago

Updated 2 years ago

awesome-kaldi by YoavRamon

List of Kaldi ASR resources

Created 7 years ago

Updated 4 years ago

MASR by yeyupiaoling

PyTorch ASR framework for streaming and non-streaming speech recognition

Created 5 years ago

Updated 2 months ago

TensorflowASR by Z-yq

ASR toolkit for CPU/edge deployment, approaching GPU model performance

Created 6 years ago

Updated 11 months ago

Starred by

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera) and

Soumith Chintala

Soumith Chintala(Coauthor of PyTorch).

espresso by freewym

Fast end-to-end neural speech recognition

Created 7 years ago

Updated 1 year ago

athena by athena-team

Open-source speech processing engine for industrial/academic use

Created 6 years ago

Updated 3 years ago

Starred by

Patrick von Platen

Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral).

icefall by k2-fsa

Speech-related recipes for various datasets using k2-fsa and lhotse

Created 4 years ago

Updated 1 week ago

RapidOCR by RapidAI

Fast, multi-platform OCR toolkit

Created 5 years ago

Updated 1 week ago

Starred by

Travis Fischer

Travis Fischer(Founder of Agentic) and

Luis Capelo

Luis Capelo(Cofounder of Lightning AI).

vosk-api by alphacep

Offline speech recognition for 20+ languages

Created 6 years ago

Updated 3 days ago

Starred by

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera),

Luis Capelo

Luis Capelo(Cofounder of Lightning AI), and

10 more.

MiniCPM-o by OpenBMB

MLLM for vision, speech, and multimodal live streaming on your phone

Created 2 years ago

Updated 2 days ago

Starred by

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"),

Piotr Dąbkowski

Piotr Dąbkowski(Cofounder of ElevenLabs), and

2 more.

PaddleSpeech by PaddlePaddle

Speech toolkit for ASR, TTS, speaker verification, translation, and keyword spotting

Created 8 years ago

Updated 2 weeks ago

Feedback? Help us improve.