EmotiVoice  by netease-youdao

Multi-voice TTS engine with prompt control for expressive speech synthesis

created 1 year ago
8,113 stars

Top 6.5% on sourcepulse

GitHubView on GitHub
Project Summary

EmotiVoice is an open-source, multi-voice, and prompt-controlled Text-to-Speech (TTS) engine supporting English and Chinese. It excels at emotional synthesis, offering over 2000 distinct voices and a wide range of controllable emotions, making it suitable for researchers and developers building advanced voice applications.

How It Works

EmotiVoice leverages a prompt-based control mechanism, allowing users to influence speech characteristics like emotion and style through text prompts. This approach, inspired by PromptTTS, enables fine-grained control over the synthesized output beyond traditional speaker identity, offering a more expressive and nuanced TTS experience.

Quick Start & Requirements

  • Docker: docker run -dp 127.0.0.1:8501:8501 syq163/emoti-voice:latest (requires NVIDIA GPU and NVIDIA Container Toolkit).
  • Full Install: Requires Python 3.8, PyTorch, torchaudio, and several other Python libraries. Pretrained model files must be downloaded separately.
  • Demo: A web UI is available at http://localhost:8501 after running the Docker image or streamlit run demo_page.py after installation.
  • API: An OpenAI-compatible TTS API is available at http://localhost:8000/.
  • Docs: Wiki Page

Highlighted Details

  • Supports over 2000 voices and controllable emotional synthesis (happy, sad, angry, etc.).
  • Includes voice cloning capabilities with provided recipes.
  • Offers an OpenAI-compatible TTS API for easier integration.
  • Provides both a web UI and a scripting interface for batch generation.

Maintenance & Community

The project is actively developed, with recent updates including speed tuning, a Mac app, and an HTTP API. Community engagement is encouraged via GitHub issues and a WeChat group.

Licensing & Compatibility

EmotiVoice is licensed under the Apache-2.0 License, permitting commercial use and integration with closed-source projects. The interactive page has a separate User Agreement.

Limitations & Caveats

Currently, EmotiVoice primarily focuses on pitch, speed, and energy for style control, not explicitly gender. Support for additional languages like Japanese and Korean is under development.

Health Check
Last commit

11 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
211 stars in the last 90 days

Explore Similar Projects

Starred by Dan Guido Dan Guido(Cofounder of Trail of Bits), Joe Walnes Joe Walnes(Head of Experimental Projects at Stripe), and
1 more.

chatterbox by resemble-ai

1.6%
10k
Open-source TTS model
created 3 months ago
updated 1 day ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Lianmin Zheng Lianmin Zheng(Author of SGLang).

fish-speech by fishaudio

0.3%
23k
Open-source TTS for multilingual speech synthesis
created 1 year ago
updated 1 week ago
Feedback? Help us improve.