fish-speech  by fishaudio

Open-source TTS for multilingual speech synthesis

Created 1 year ago
22,953 stars

Top 1.8% on SourcePulse

GitHubView on GitHub
Project Summary

Fish Speech is an open-source Text-to-Speech (TTS) system offering zero-shot and few-shot voice cloning, multilingual support, and a no-phoneme dependency approach. It targets researchers and developers seeking high-quality, adaptable TTS capabilities, enabling rapid prototyping and deployment of voice generation applications.

How It Works

The system leverages a VITS2-based architecture, enhanced with LLM integration for advanced multilingual and cross-lingual synthesis. Its key advantage is the absence of phoneme dependency, allowing it to generalize across various language scripts and achieve high accuracy with low Character/Word Error Rates. Timbre and emotional control are also integrated, allowing for nuanced speech generation.

Quick Start & Requirements

  • Install/Run: Follow documentation for local inference via Gradio WebUI or PyQt6 GUI.
  • Prerequisites: Nvidia GPU (RTX 4060 recommended for 1:5 real-time factor, RTX 4090 for 1:15).
  • Resources: Requires model weights download.
  • Links: Online Demo, Fish Agent Quick Start, Documents

Highlighted Details

  • Zero-shot & Few-shot voice cloning with 10-30 second vocal samples.
  • Supports 8 languages (English, Japanese, Korean, Chinese, French, German, Arabic, Spanish) with cross-lingual capabilities.
  • Achieves ~2% CER/WER on 5-minute English texts.
  • Offers both Gradio WebUI and PyQt6 GUI interfaces.

Maintenance & Community

  • Actively developed with recent updates (V1.5 Demo Video).
  • Community support via issues and pull requests.
  • X (Twitter)

Licensing & Compatibility

  • Codebase: Apache License.
  • Model Weights: CC-BY-NC-SA-4.0 License.
  • Restrictions: Non-commercial use for model weights.

Limitations & Caveats

The Fish Agent demo is an early alpha with unoptimized inference speed and known bugs. The CC-BY-NC-SA-4.0 license restricts commercial use of the model weights.

Health Check
Last Commit

1 week ago

Responsiveness

1 day

Pull Requests (30d)
6
Issues (30d)
14
Star History
292 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Pietro Schirano Pietro Schirano(Founder of MagicPath), and
2 more.

metavoice-src by metavoiceio

0.1%
4k
TTS model for human-like, expressive speech
Created 1 year ago
Updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems").

MeloTTS by myshell-ai

0.5%
7k
Multilingual text-to-speech library
Created 1 year ago
Updated 8 months ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Junyang Lin Junyang Lin(Core Maintainer at Alibaba Qwen), and
6 more.

OpenVoice by myshell-ai

0.2%
34k
Audio foundation model for versatile, instant voice cloning
Created 1 year ago
Updated 5 months ago
Starred by Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm) and Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems").

GPT-SoVITS by RVC-Boss

0.3%
51k
Few-shot voice cloning and TTS web UI
Created 1 year ago
Updated 1 week ago
Feedback? Help us improve.