PPASR  by yeyupiaoling

PaddlePaddle-based speech recognition framework

Created 4 years ago
865 stars

Top 41.5% on SourcePulse

GitHubView on GitHub
Project Summary

PPASR is an end-to-end Chinese automatic speech recognition (ASR) framework built on PaddlePaddle. It offers a simplified and practical approach to ASR, supporting popular models like DeepSpeech2, Conformer, and Squeezeformer, with both streaming and non-streaming capabilities. The project targets developers and researchers looking for a flexible and efficient ASR solution that can be deployed on servers and edge devices like Nvidia Jetson.

How It Works

PPASR (V3) is a significant overhaul from V2, focusing on ease of use and enhanced performance. It utilizes kaldi_native_fbank for faster, multi-platform compatible audio preprocessing and sentencepiece for tokenization, enabling easier handling of multiple languages and mixed-language training. The framework supports various decoding methods (e.g., ctc_greedy_search, ctc_prefix_beam_search, attention_rescoring) and data augmentation techniques for improved robustness.

Quick Start & Requirements

  • Installation: Requires Anaconda 3, Python 3.11, and PaddlePaddle 2.6.1.
  • OS: Windows 11 or Ubuntu 22.04.
  • Resources: Pre-trained models are available via a "knowledge planet" (details in README).
  • Docs: Links to online demos, WeChat mini-programs, and video tutorials are provided in the README.

Highlighted Details

  • Supports multiple ASR models including Conformer, Squeezeformer, and DeepSpeech2, all with streaming and non-streaming options.
  • Offers diverse decoding strategies and data augmentation methods for customized performance.
  • Pre-trained models are available for various datasets (WenetSpeech, AIShell, Librispeech) with reported character error rates (CER) or word error rates (WER). For example, a Conformer model with attention_rescoring on WenetSpeech achieves a CER of 0.13786 on test_net.

Maintenance & Community

  • The project is actively developed, with V3 being the latest release.
  • Community discussion is encouraged via QQ groups and a "knowledge planet" which also provides model files. Links to Bilibili for video explanations are included.

Licensing & Compatibility

  • The README does not explicitly state a license. Compatibility for commercial use is not specified.

Limitations & Caveats

  • Pre-trained model weights are not directly downloadable from the repository and require joining a paid "knowledge planet."
  • The project is primarily focused on Chinese ASR, though it mentions support for English and mixed-language training.
Health Check
Last Commit

3 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.