parrots  by shibing624

ASR/TTS toolkit for multilingual speech processing

created 7 years ago
500 stars

Top 63.0% on sourcepulse

GitHubView on GitHub
Project Summary

Parrots is an open-source toolkit providing integrated Automatic Speech Recognition (ASR) and Text-To-Speech (TTS) capabilities, designed for ease of use with a focus on Chinese and English. It targets developers and researchers needing a quick way to implement speech processing tasks, offering pre-trained models for multi-lingual and multi-character voice synthesis.

How It Works

The ASR component leverages distilwhisper for robust speech-to-text conversion across multiple languages. The TTS engine is built upon GPT-SoVITS, enabling high-quality voice synthesis with support for various languages and distinct speaker identities, including singing voices. This dual approach allows for a unified, "out-of-the-box" solution for common speech AI applications.

Quick Start & Requirements

Highlighted Details

  • Supports ASR for Chinese, English, and other languages.
  • TTS supports Chinese, English, Japanese, and more, with pre-trained speakers like "MaiMai" (singing female anchor) and "KusanagiNene" (loli).
  • Offers both Python API and command-line interface (CLI) for ASR and TTS tasks.
  • Models are available on HuggingFace, simplifying integration.

Maintenance & Community

  • Project maintained by shibing624.
  • Contact via email (xuming624@qq.com) or WeChat (xuming624) for the Python-NLP exchange group.
  • Encourages contributions with unit tests.

Licensing & Compatibility

  • Licensed under the Apache License 2.0.
  • Permissive for commercial use, but requires attribution to Parrots and the license in product documentation.

Limitations & Caveats

The project code is described as "rough," suggesting potential instability or areas needing refinement. While it supports multiple languages, the primary focus and default models appear to be Chinese.

Health Check
Last commit

8 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
11 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Lianmin Zheng Lianmin Zheng(Author of SGLang).

fish-speech by fishaudio

0.3%
23k
Open-source TTS for multilingual speech synthesis
created 1 year ago
updated 1 week ago
Feedback? Help us improve.