Orpheus-TTS by canopyai

Open-source TTS for human-sounding speech, built on Llama-3b

Created 11 months ago

5,964 stars

Top 8.5% on SourcePulse

View on GitHub

3 Experts Love This Project

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Michael Han

Cofounder of Unsloth

Jiaming Song

Chief Scientist at Luma AI

Project Summary

Orpheus TTS is a state-of-the-art open-source text-to-speech system leveraging a Llama-3b backbone to achieve human-sounding speech. It targets developers and researchers seeking natural intonation, zero-shot voice cloning, and controlled emotional expression, offering a significant advancement over existing closed-source solutions.

How It Works

Orpheus TTS utilizes a Llama-3b LLM for speech synthesis, demonstrating emergent capabilities in this domain. This approach allows for natural intonation, emotion, and rhythm. It supports zero-shot voice cloning without prior fine-tuning and enables control over speech characteristics via simple tags. The system is designed for low-latency streaming, achieving around 200ms, reducible to 100ms with input streaming.

Quick Start & Requirements

Install via pip: pip install orpheus-speech (consider pip install vllm==0.7.3 if issues arise).
Requires Python and a GPU for optimal performance.
Official Colab notebooks are available for inference and fine-tuning.
See official documentation for detailed setup and voice information.

Highlighted Details

Achieves human-like speech with natural intonation, emotion, and rhythm.
Supports zero-shot voice cloning without prior fine-tuning.
Offers guided emotion and intonation control using tags like <laugh>, <sigh>.
Provides ~200ms streaming latency, reducible to ~100ms.
Includes multilingual models and data processing scripts for custom fine-tuning.

Maintenance & Community

The project is actively developed by Canopy AI. Feedback and questions are welcomed in the discussion forum.

Licensing & Compatibility

The README does not explicitly state the license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is in a research preview stage, with ongoing work to fix glitches in real-time streaming and improve voice cloning implementation. Not all model sizes (1b, 400m, 150m) have been released yet.

Health Check

Last Commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

68 stars in the last 30 days