Open-source TTS for multilingual speech synthesis
Top 1.9% on sourcepulse
Fish Speech is an open-source Text-to-Speech (TTS) system offering zero-shot and few-shot voice cloning, multilingual support, and a no-phoneme dependency approach. It targets researchers and developers seeking high-quality, adaptable TTS capabilities, enabling rapid prototyping and deployment of voice generation applications.
How It Works
The system leverages a VITS2-based architecture, enhanced with LLM integration for advanced multilingual and cross-lingual synthesis. Its key advantage is the absence of phoneme dependency, allowing it to generalize across various language scripts and achieve high accuracy with low Character/Word Error Rates. Timbre and emotional control are also integrated, allowing for nuanced speech generation.
Quick Start & Requirements
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The Fish Agent demo is an early alpha with unoptimized inference speed and known bugs. The CC-BY-NC-SA-4.0 license restricts commercial use of the model weights.
1 week ago
1 day