PaddleSpeech by PaddlePaddle

Speech toolkit for ASR, TTS, speaker verification, translation, and keyword spotting

Created 8 years ago

12,536 stars

Top 4.0% on SourcePulse

View on GitHub

4 Experts Love This Project

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Piotr Dąbkowski

Cofounder of ElevenLabs

Project Summary

PaddleSpeech is an all-in-one, easy-to-use speech toolkit built on PaddlePaddle, targeting both industrial applications and academic research. It provides state-of-the-art models and flexible implementations for a wide range of speech tasks, including ASR, TTS, speaker verification, and speech translation, with a focus on streaming capabilities and efficient deployment.

How It Works

PaddleSpeech leverages the PaddlePaddle deep learning framework to offer a comprehensive suite of speech processing tools. Its architecture supports both traditional cascaded pipelines and end-to-end models. Key advantages include a rule-based Chinese text frontend with normalization and G2P, support for streaming ASR and TTS, and integration with mainstream models and datasets. The toolkit emphasizes ease of use through CLI, server, and streaming server options.

Quick Start & Requirements

Installation: pip install paddlespeech (recommended) or from source.
Prerequisites: Python >= 3.8, GCC >= 4.8.5, PaddlePaddle. Linux is recommended.
Resources: Specific model requirements vary; refer to the Quick Start and Documents.

Highlighted Details

Won NAACL2022 Best Demo Award.
Supports streaming ASR and TTS systems.
Features a rule-based Chinese text frontend with polyphone and tone sandhi handling.
Includes implementations for ASR, TTS, Speaker Verification, Keyword Spotting, Audio Classification, and Speech Translation.

Maintenance & Community

The project is actively maintained with frequent updates. Community engagement is encouraged via GitHub issues and discussions. A WeChat technical exchange group is available for support and learning materials.

Licensing & Compatibility

PaddleSpeech is provided under the Apache-2.0 License, which permits commercial use and linking with closed-source projects.

Limitations & Caveats

The Speech Translation module's command-line experience is noted to only support Ubuntu systems, likely due to Kaldi dependencies. Some older models or specific features might require older versions of PaddlePaddle.

Health Check

Last Commit

2 weeks ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

46 stars in the last 30 days