fish-speech by fishaudio

Open-source TTS for multilingual speech synthesis

Created 2 years ago

31,220 stars

Top 1.4% on SourcePulse

View on GitHub

5 Experts Love This Project

Travis Fischer

Founder of Agentic

Jiaming Song

Chief Scientist at Luma AI

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Chaoyu Yang

Founder of Bento

and 1 more!

Project Summary

Fish Speech is an open-source Text-to-Speech (TTS) system offering zero-shot and few-shot voice cloning, multilingual support, and a no-phoneme dependency approach. It targets researchers and developers seeking high-quality, adaptable TTS capabilities, enabling rapid prototyping and deployment of voice generation applications.

How It Works

The system leverages a VITS2-based architecture, enhanced with LLM integration for advanced multilingual and cross-lingual synthesis. Its key advantage is the absence of phoneme dependency, allowing it to generalize across various language scripts and achieve high accuracy with low Character/Word Error Rates. Timbre and emotional control are also integrated, allowing for nuanced speech generation.

Quick Start & Requirements

Install/Run: Follow documentation for local inference via Gradio WebUI or PyQt6 GUI.
Prerequisites: Nvidia GPU (RTX 4060 recommended for 1:5 real-time factor, RTX 4090 for 1:15).
Resources: Requires model weights download.
Links: Online Demo, Fish Agent Quick Start, Documents

Highlighted Details

Zero-shot & Few-shot voice cloning with 10-30 second vocal samples.
Supports 8 languages (English, Japanese, Korean, Chinese, French, German, Arabic, Spanish) with cross-lingual capabilities.
Achieves ~2% CER/WER on 5-minute English texts.
Offers both Gradio WebUI and PyQt6 GUI interfaces.

Maintenance & Community

Actively developed with recent updates (V1.5 Demo Video).
Community support via issues and pull requests.
X (Twitter)

Licensing & Compatibility

Codebase: Apache License.
Model Weights: CC-BY-NC-SA-4.0 License.
Restrictions: Non-commercial use for model weights.

Limitations & Caveats

The Fish Agent demo is an early alpha with unoptimized inference speed and known bugs. The CC-BY-NC-SA-4.0 license restricts commercial use of the model weights.

Health Check

Last Commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

515 stars in the last 30 days