PaddleSpeech  by PaddlePaddle

Speech toolkit for ASR, TTS, speaker verification, translation, and keyword spotting

created 7 years ago
12,110 stars

Top 4.2% on sourcepulse

GitHubView on GitHub
Project Summary

PaddleSpeech is an all-in-one, easy-to-use speech toolkit built on PaddlePaddle, targeting both industrial applications and academic research. It provides state-of-the-art models and flexible implementations for a wide range of speech tasks, including ASR, TTS, speaker verification, and speech translation, with a focus on streaming capabilities and efficient deployment.

How It Works

PaddleSpeech leverages the PaddlePaddle deep learning framework to offer a comprehensive suite of speech processing tools. Its architecture supports both traditional cascaded pipelines and end-to-end models. Key advantages include a rule-based Chinese text frontend with normalization and G2P, support for streaming ASR and TTS, and integration with mainstream models and datasets. The toolkit emphasizes ease of use through CLI, server, and streaming server options.

Quick Start & Requirements

  • Installation: pip install paddlespeech (recommended) or from source.
  • Prerequisites: Python >= 3.8, GCC >= 4.8.5, PaddlePaddle. Linux is recommended.
  • Resources: Specific model requirements vary; refer to the Quick Start and Documents.

Highlighted Details

  • Won NAACL2022 Best Demo Award.
  • Supports streaming ASR and TTS systems.
  • Features a rule-based Chinese text frontend with polyphone and tone sandhi handling.
  • Includes implementations for ASR, TTS, Speaker Verification, Keyword Spotting, Audio Classification, and Speech Translation.

Maintenance & Community

The project is actively maintained with frequent updates. Community engagement is encouraged via GitHub issues and discussions. A WeChat technical exchange group is available for support and learning materials.

Licensing & Compatibility

PaddleSpeech is provided under the Apache-2.0 License, which permits commercial use and linking with closed-source projects.

Limitations & Caveats

The Speech Translation module's command-line experience is noted to only support Ubuntu systems, likely due to Kaldi dependencies. Some older models or specific features might require older versions of PaddlePaddle.

Health Check
Last commit

1 week ago

Responsiveness

1 day

Pull Requests (30d)
3
Issues (30d)
94
Star History
311 stars in the last 90 days

Explore Similar Projects

Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
2 more.

MiniCPM-o by OpenBMB

0.2%
20k
MLLM for vision, speech, and multimodal live streaming on your phone
created 1 year ago
updated 1 month ago
Feedback? Help us improve.