Voice generation model for inference, training, and deployment
Top 3.2% on sourcepulse
CosyVoice is a comprehensive, multilingual large voice generation model offering full-stack capabilities for inference, training, and deployment. It targets researchers and developers needing advanced text-to-speech (TTS) and voice conversion (VC) functionalities, enabling high-quality, low-latency, and expressive speech synthesis across multiple languages and dialects.
How It Works
CosyVoice 2.0 leverages a combination of offline and streaming modeling technologies for ultra-low latency bidirectional streaming. It achieves rapid first-packet synthesis with latencies as low as 150ms. The model boasts improved pronunciation accuracy, reduced character error rates, and enhanced prosody and sound quality, evidenced by higher MOS scores. It supports cross-lingual and mix-lingual zero-shot voice cloning, allowing for seamless voice replication across different languages and code-switching scenarios.
Quick Start & Requirements
git clone --recursive
) and install dependencies via Conda (conda create -n cosyvoice python=3.10
, conda install -c conda-forge pynini
, pip install -r requirements.txt
). Ensure sox
and libsox-dev
are installed on Ubuntu/CentOS.iic/CosyVoice2-0.5B
) using modelscope.snapshot_download
or git clone
.CosyVoice2
and CosyVoice
classes, demonstrating zero-shot, cross-lingual, fine-grained control, and instruct-based synthesis.funaudiollm.github.io/cosyvoice2
.Highlighted Details
Maintenance & Community
The project acknowledges borrowing code from FunASR, FunCodec, Matcha-TTS, AcademiCodec, and WeNet. Discussion and communication are primarily through GitHub Issues.
Licensing & Compatibility
The repository does not explicitly state a license in the README. This requires further investigation for commercial use or closed-source linking.
Limitations & Caveats
The README does not specify licensing details, which is a critical factor for adoption. Some examples are sourced from the internet, with a disclaimer for potential content infringement.
2 weeks ago
1 day