OpenVoice by myshell-ai

Audio foundation model for versatile, instant voice cloning

Created 2 years ago

36,905 stars

Top 1.1% on SourcePulse

View on GitHub

8 Experts Love This Project

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Junyang Lin

Core Maintainer at Alibaba Qwen

Luis Capelo

Cofounder of Lightning AI

Chaoyu Yang

Founder of Bento

and 4 more!

Project Summary

OpenVoice is an audio foundation model for versatile instant voice cloning, enabling users to replicate voice tone color and control speech style with high accuracy. It targets researchers and developers needing advanced speech synthesis capabilities, offering zero-shot cross-lingual cloning and flexible style control for applications like personalized content creation and accessibility tools.

How It Works

OpenVoice leverages a multi-stage approach, building upon VITS and VITS2 architectures. It focuses on accurately cloning "tone color" (timbre) while allowing granular control over style parameters such as emotion, accent, rhythm, and intonation. The zero-shot cross-lingual capability means it can clone voices and generate speech in languages not seen during training, a significant advantage for global applications.

Quick Start & Requirements

Install: pip install openvoice
Prerequisites: Python 3.8+, PyTorch 1.13+. GPU with CUDA 11.8+ recommended for optimal performance.
Resources: Requires downloading pre-trained models (size not specified).
Docs: Usage

Highlighted Details

Accurate tone color cloning across multiple languages and accents.
Flexible control over voice style parameters (emotion, accent, rhythm, intonation).
Zero-shot cross-lingual voice cloning capability.
V2 offers improved audio quality and native support for English, Spanish, French, Chinese, Japanese, and Korean.

Maintenance & Community

Developed by researchers from MIT and MyShell. The project has seen significant real-world usage on myshell.ai, indicating active development and user engagement.

Licensing & Compatibility

OpenVoice V1 and V2 are released under the MIT License, permitting free commercial and research use.

Limitations & Caveats

While V2 offers native multi-lingual support for specific languages, its cross-lingual capabilities for other language pairs may vary. The README does not detail specific hardware requirements beyond GPU recommendations or potential limitations on the length or complexity of cloned audio.

Health Check

Last Commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

299 stars in the last 30 days