Orpheus-TTS  by canopyai

Open-source TTS for human-sounding speech, built on Llama-3b

Created 6 months ago
5,561 stars

Top 9.2% on SourcePulse

GitHubView on GitHub
Project Summary

Orpheus TTS is a state-of-the-art open-source text-to-speech system leveraging a Llama-3b backbone to achieve human-sounding speech. It targets developers and researchers seeking natural intonation, zero-shot voice cloning, and controlled emotional expression, offering a significant advancement over existing closed-source solutions.

How It Works

Orpheus TTS utilizes a Llama-3b LLM for speech synthesis, demonstrating emergent capabilities in this domain. This approach allows for natural intonation, emotion, and rhythm. It supports zero-shot voice cloning without prior fine-tuning and enables control over speech characteristics via simple tags. The system is designed for low-latency streaming, achieving around 200ms, reducible to 100ms with input streaming.

Quick Start & Requirements

  • Install via pip: pip install orpheus-speech (consider pip install vllm==0.7.3 if issues arise).
  • Requires Python and a GPU for optimal performance.
  • Official Colab notebooks are available for inference and fine-tuning.
  • See official documentation for detailed setup and voice information.

Highlighted Details

  • Achieves human-like speech with natural intonation, emotion, and rhythm.
  • Supports zero-shot voice cloning without prior fine-tuning.
  • Offers guided emotion and intonation control using tags like <laugh>, <sigh>.
  • Provides ~200ms streaming latency, reducible to ~100ms.
  • Includes multilingual models and data processing scripts for custom fine-tuning.

Maintenance & Community

The project is actively developed by Canopy AI. Feedback and questions are welcomed in the discussion forum.

Licensing & Compatibility

The README does not explicitly state the license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is in a research preview stage, with ongoing work to fix glitches in real-time streaming and improve voice cloning implementation. Not all model sizes (1b, 400m, 150m) have been released yet.

Health Check
Last Commit

4 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
9
Star History
155 stars in the last 30 days

Explore Similar Projects

Starred by Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm) and Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems").

GPT-SoVITS by RVC-Boss

0.3%
51k
Few-shot voice cloning and TTS web UI
Created 1 year ago
Updated 1 week ago
Feedback? Help us improve.