Multi-voice TTS engine with prompt control for expressive speech synthesis
Top 6.5% on sourcepulse
EmotiVoice is an open-source, multi-voice, and prompt-controlled Text-to-Speech (TTS) engine supporting English and Chinese. It excels at emotional synthesis, offering over 2000 distinct voices and a wide range of controllable emotions, making it suitable for researchers and developers building advanced voice applications.
How It Works
EmotiVoice leverages a prompt-based control mechanism, allowing users to influence speech characteristics like emotion and style through text prompts. This approach, inspired by PromptTTS, enables fine-grained control over the synthesized output beyond traditional speaker identity, offering a more expressive and nuanced TTS experience.
Quick Start & Requirements
docker run -dp 127.0.0.1:8501:8501 syq163/emoti-voice:latest
(requires NVIDIA GPU and NVIDIA Container Toolkit).http://localhost:8501
after running the Docker image or streamlit run demo_page.py
after installation.http://localhost:8000/
.Highlighted Details
Maintenance & Community
The project is actively developed, with recent updates including speed tuning, a Mac app, and an HTTP API. Community engagement is encouraged via GitHub issues and a WeChat group.
Licensing & Compatibility
EmotiVoice is licensed under the Apache-2.0 License, permitting commercial use and integration with closed-source projects. The interactive page has a separate User Agreement.
Limitations & Caveats
Currently, EmotiVoice primarily focuses on pitch, speed, and energy for style control, not explicitly gender. Support for additional languages like Japanese and Korean is under development.
11 months ago
1 day