parrots by shibing624

ASR/TTS toolkit for multilingual speech processing

Created 7 years ago

526 stars

Top 59.9% on SourcePulse

Project Summary

Parrots is an open-source toolkit providing integrated Automatic Speech Recognition (ASR) and Text-To-Speech (TTS) capabilities, designed for ease of use with a focus on Chinese and English. It targets developers and researchers needing a quick way to implement speech processing tasks, offering pre-trained models for multi-lingual and multi-character voice synthesis.

How It Works

The ASR component leverages distilwhisper for robust speech-to-text conversion across multiple languages. The TTS engine is built upon GPT-SoVITS, enabling high-quality voice synthesis with support for various languages and distinct speaker identities, including singing voices. This dual approach allows for a unified, "out-of-the-box" solution for common speech AI applications.

Quick Start & Requirements

Install via pip: pip install torch (or conda), then pip install parrots.
Alternatively, clone the repo and run python setup.py install.
Requires PyTorch.
Official Demo: https://www.mulanai.com/product/tts/
HuggingFace Demo: https://huggingface.co/spaces/shibing624/parrots
Example usage scripts are provided in the examples/ directory.

Highlighted Details

Supports ASR for Chinese, English, and other languages.
TTS supports Chinese, English, Japanese, and more, with pre-trained speakers like "MaiMai" (singing female anchor) and "KusanagiNene" (loli).
Offers both Python API and command-line interface (CLI) for ASR and TTS tasks.
Models are available on HuggingFace, simplifying integration.

Maintenance & Community

Project maintained by shibing624.
Contact via email (xuming624@qq.com) or WeChat (xuming624) for the Python-NLP exchange group.
Encourages contributions with unit tests.

Licensing & Compatibility

Licensed under the Apache License 2.0.
Permissive for commercial use, but requires attribution to Parrots and the license in product documentation.

Limitations & Caveats

The project code is described as "rough," suggesting potential instability or areas needing refinement. While it supports multiple languages, the primary focus and default models appear to be Chinese.

parrots by shibing624

Explore Similar Projects

praises by ElmTran

speech-recognition-uk by egorsmkv

ASR-TTS-paper-daily by halsay

LLaSM by LinkSoul-AI

fast-voice-assistant by dsa

ASR-LLM-TTS by ABexit

IMS-Toucan by DigitalPhonetics

FastSpeech2 by ming024

voice-pro by abus-aikorea

sherpa-onnx by k2-fsa

PaddleSpeech by PaddlePaddle

GPT-SoVITS by RVC-Boss