Discover and explore top open-source AI tools and projects—updated daily.
Streaming TTS for natural, long-form dialogue
New!
Top 61.4% on SourcePulse
FireRedTTS-2 provides a long-form streaming Text-to-Speech (TTS) system for multi-speaker dialogue generation, delivering stable, natural speech with context-aware prosody. It targets researchers and developers in conversational AI, podcasting, and chatbot development, offering high-quality, low-latency synthesis with advanced zero-shot voice cloning and multilingual capabilities.
How It Works
The system utilizes a novel dual-transformer architecture with a 12.5Hz streaming speech tokenizer, enabling flexible, sentence-by-sentence generation and ultra-low first-packet latency (as low as 140ms on an L20 GPU). It supports long conversational speech (3 mins, 4 speakers, scalable) and offers multilingual capabilities across 7 languages, including zero-shot voice cloning for cross-lingual and code-switching scenarios. Random timbre generation is also supported.
Quick Start & Requirements
Installation requires cloning the repo, setting up a Python 3.11 Conda environment, and installing PyTorch with CUDA 12.6 support. Dependencies are managed via requirements.txt
. Pre-trained models are available via Git LFS from Hugging Face. A Gradio web UI demo is provided for easy generation (python gradio_demo.py
).
Highlighted Details
Maintenance & Community
The roadmap includes releasing an enhanced multilingual model, fine-tuning code, and an end-to-end text-to-blog pipeline in October 2025. No specific community channels or contributor details are listed.
Licensing & Compatibility
No explicit license is stated. A disclaimer restricts zero-shot voice cloning strictly to academic research purposes, prohibiting illegal activities, implying a non-commercial or research-focused usage.
Limitations & Caveats
Zero-shot voice cloning is limited to academic research and must not be used illegally. Installation requires specific PyTorch versions tied to CUDA 12.6. The project acknowledges adapting code from other models, potentially implying usage terms from those sources.
1 day ago
Inactive