Orpheus-TTS  by canopyai

Open-source TTS for human-sounding speech, built on Llama-3b

created 4 months ago
5,304 stars

Top 9.6% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Orpheus TTS is a state-of-the-art open-source text-to-speech system leveraging a Llama-3b backbone to achieve human-sounding speech. It targets developers and researchers seeking natural intonation, zero-shot voice cloning, and controlled emotional expression, offering a significant advancement over existing closed-source solutions.

How It Works

Orpheus TTS utilizes a Llama-3b LLM for speech synthesis, demonstrating emergent capabilities in this domain. This approach allows for natural intonation, emotion, and rhythm. It supports zero-shot voice cloning without prior fine-tuning and enables control over speech characteristics via simple tags. The system is designed for low-latency streaming, achieving around 200ms, reducible to 100ms with input streaming.

Quick Start & Requirements

  • Install via pip: pip install orpheus-speech (consider pip install vllm==0.7.3 if issues arise).
  • Requires Python and a GPU for optimal performance.
  • Official Colab notebooks are available for inference and fine-tuning.
  • See official documentation for detailed setup and voice information.

Highlighted Details

  • Achieves human-like speech with natural intonation, emotion, and rhythm.
  • Supports zero-shot voice cloning without prior fine-tuning.
  • Offers guided emotion and intonation control using tags like <laugh>, <sigh>.
  • Provides ~200ms streaming latency, reducible to ~100ms.
  • Includes multilingual models and data processing scripts for custom fine-tuning.

Maintenance & Community

The project is actively developed by Canopy AI. Feedback and questions are welcomed in the discussion forum.

Licensing & Compatibility

The README does not explicitly state the license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is in a research preview stage, with ongoing work to fix glitches in real-time streaming and improve voice cloning implementation. Not all model sizes (1b, 400m, 150m) have been released yet.

Health Check
Last commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
14
Star History
725 stars in the last 90 days

Explore Similar Projects

Starred by Dan Guido Dan Guido(Cofounder of Trail of Bits), Joe Walnes Joe Walnes(Head of Experimental Projects at Stripe), and
1 more.

chatterbox by resemble-ai

1.6%
10k
Open-source TTS model
created 3 months ago
updated 1 day ago
Starred by Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm) and Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems).

GPT-SoVITS by RVC-Boss

0.6%
49k
Few-shot voice cloning and TTS web UI
created 1 year ago
updated 2 weeks ago
Feedback? Help us improve.