Open-source TTS for human-sounding speech, built on Llama-3b
Top 9.6% on sourcepulse
Orpheus TTS is a state-of-the-art open-source text-to-speech system leveraging a Llama-3b backbone to achieve human-sounding speech. It targets developers and researchers seeking natural intonation, zero-shot voice cloning, and controlled emotional expression, offering a significant advancement over existing closed-source solutions.
How It Works
Orpheus TTS utilizes a Llama-3b LLM for speech synthesis, demonstrating emergent capabilities in this domain. This approach allows for natural intonation, emotion, and rhythm. It supports zero-shot voice cloning without prior fine-tuning and enables control over speech characteristics via simple tags. The system is designed for low-latency streaming, achieving around 200ms, reducible to 100ms with input streaming.
Quick Start & Requirements
pip install orpheus-speech
(consider pip install vllm==0.7.3
if issues arise).Highlighted Details
<laugh>
, <sigh>
.Maintenance & Community
The project is actively developed by Canopy AI. Feedback and questions are welcomed in the discussion forum.
Licensing & Compatibility
The README does not explicitly state the license. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The project is in a research preview stage, with ongoing work to fix glitches in real-time streaming and improve voice cloning implementation. Not all model sizes (1b, 400m, 150m) have been released yet.
2 months ago
1 day