PaddleSpeech  by PaddlePaddle

Speech toolkit for ASR, TTS, speaker verification, translation, and keyword spotting

Created 7 years ago
12,234 stars

Top 4.1% on SourcePulse

GitHubView on GitHub
Project Summary

PaddleSpeech is an all-in-one, easy-to-use speech toolkit built on PaddlePaddle, targeting both industrial applications and academic research. It provides state-of-the-art models and flexible implementations for a wide range of speech tasks, including ASR, TTS, speaker verification, and speech translation, with a focus on streaming capabilities and efficient deployment.

How It Works

PaddleSpeech leverages the PaddlePaddle deep learning framework to offer a comprehensive suite of speech processing tools. Its architecture supports both traditional cascaded pipelines and end-to-end models. Key advantages include a rule-based Chinese text frontend with normalization and G2P, support for streaming ASR and TTS, and integration with mainstream models and datasets. The toolkit emphasizes ease of use through CLI, server, and streaming server options.

Quick Start & Requirements

  • Installation: pip install paddlespeech (recommended) or from source.
  • Prerequisites: Python >= 3.8, GCC >= 4.8.5, PaddlePaddle. Linux is recommended.
  • Resources: Specific model requirements vary; refer to the Quick Start and Documents.

Highlighted Details

  • Won NAACL2022 Best Demo Award.
  • Supports streaming ASR and TTS systems.
  • Features a rule-based Chinese text frontend with polyphone and tone sandhi handling.
  • Includes implementations for ASR, TTS, Speaker Verification, Keyword Spotting, Audio Classification, and Speech Translation.

Maintenance & Community

The project is actively maintained with frequent updates. Community engagement is encouraged via GitHub issues and discussions. A WeChat technical exchange group is available for support and learning materials.

Licensing & Compatibility

PaddleSpeech is provided under the Apache-2.0 License, which permits commercial use and linking with closed-source projects.

Limitations & Caveats

The Speech Translation module's command-line experience is noted to only support Ubuntu systems, likely due to Kaldi dependencies. Some older models or specific features might require older versions of PaddlePaddle.

Health Check
Last Commit

5 days ago

Responsiveness

1 day

Pull Requests (30d)
9
Issues (30d)
10
Star History
89 stars in the last 30 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), Benjamin Bolte Benjamin Bolte(Cofounder of K-Scale Labs), and
3 more.

espnet by espnet

0.2%
9k
End-to-end speech processing toolkit for various speech tasks
Created 7 years ago
Updated 3 days ago
Feedback? Help us improve.