EmotiVoice by netease-youdao

Multi-voice TTS engine with prompt control for expressive speech synthesis

Created 2 years ago

8,410 stars

Top 6.2% on SourcePulse

View on GitHub

3 Experts Love This Project

Tim J. Baek

Founder of Open WebUI

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Alexander Wu

Founder of MetaGPT

Project Summary

EmotiVoice is an open-source, multi-voice, and prompt-controlled Text-to-Speech (TTS) engine supporting English and Chinese. It excels at emotional synthesis, offering over 2000 distinct voices and a wide range of controllable emotions, making it suitable for researchers and developers building advanced voice applications.

How It Works

EmotiVoice leverages a prompt-based control mechanism, allowing users to influence speech characteristics like emotion and style through text prompts. This approach, inspired by PromptTTS, enables fine-grained control over the synthesized output beyond traditional speaker identity, offering a more expressive and nuanced TTS experience.

Quick Start & Requirements

Docker: docker run -dp 127.0.0.1:8501:8501 syq163/emoti-voice:latest (requires NVIDIA GPU and NVIDIA Container Toolkit).
Full Install: Requires Python 3.8, PyTorch, torchaudio, and several other Python libraries. Pretrained model files must be downloaded separately.
Demo: A web UI is available at http://localhost:8501 after running the Docker image or streamlit run demo_page.py after installation.
API: An OpenAI-compatible TTS API is available at http://localhost:8000/.
Docs: Wiki Page

Highlighted Details

Supports over 2000 voices and controllable emotional synthesis (happy, sad, angry, etc.).
Includes voice cloning capabilities with provided recipes.
Offers an OpenAI-compatible TTS API for easier integration.
Provides both a web UI and a scripting interface for batch generation.

Maintenance & Community

The project is actively developed, with recent updates including speed tuning, a Mac app, and an HTTP API. Community engagement is encouraged via GitHub issues and a WeChat group.

Licensing & Compatibility

EmotiVoice is licensed under the Apache-2.0 License, permitting commercial use and integration with closed-source projects. The interactive page has a separate User Agreement.

Limitations & Caveats

Currently, EmotiVoice primarily focuses on pitch, speed, and energy for style control, not explicitly gender. Support for additional languages like Japanese and Korean is under development.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

31 stars in the last 30 days