MeloTTS  by myshell-ai

Multilingual text-to-speech library

created 1 year ago
6,470 stars

Top 8.1% on sourcepulse

GitHubView on GitHub
Project Summary

MeloTTS is a high-quality, multi-lingual text-to-speech library designed for researchers and developers. It offers advanced TTS capabilities across multiple languages and accents, enabling the creation of natural-sounding speech for various applications.

How It Works

MeloTTS leverages a VITS-based architecture, building upon VITS, VITS2, and Bert-VITS2. This approach allows for high-quality speech synthesis with efficient inference, even supporting real-time CPU usage. A key feature is its multi-lingual support, including various English accents and mixed-language capabilities for Chinese.

Quick Start & Requirements

  • Install: pip install melo-tts
  • Prerequisites: Python. Models are available on HuggingFace.
  • Resources: Fast enough for CPU real-time inference.
  • Docs: https://github.com/myshell-ai/MeloTTS

Highlighted Details

  • Supports 10 languages and multiple English accents (American, British, Indian, Australian).
  • Chinese speaker supports mixed English and Chinese input.
  • Optimized for fast CPU real-time inference.
  • Based on VITS, VITS2, and Bert-VITS2 architectures.

Maintenance & Community

The project is led by MyShell.ai and includes contributors from Tsinghua University and MIT. Further contributions are welcomed.

Licensing & Compatibility

Released under the MIT License, permitting free commercial and non-commercial use.

Limitations & Caveats

The README does not detail specific hardware requirements beyond CPU inference speed, nor does it mention potential limitations regarding specific language nuances or model sizes.

Health Check
Last commit

7 months ago

Responsiveness

1 day

Pull Requests (30d)
1
Issues (30d)
5
Star History
501 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Lianmin Zheng Lianmin Zheng(Author of SGLang).

fish-speech by fishaudio

0.3%
23k
Open-source TTS for multilingual speech synthesis
created 1 year ago
updated 1 week ago
Starred by Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm) and Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems).

GPT-SoVITS by RVC-Boss

0.6%
49k
Few-shot voice cloning and TTS web UI
created 1 year ago
updated 2 weeks ago
Feedback? Help us improve.