Discover and explore top open-source AI tools and projects—updated daily.
ysharma3501TTS voice cloning for rapid, high-quality generation
Top 43.4% on SourcePulse
LuxTTS is a lightweight, ZipVoice-based text-to-speech (TTS) model designed for high-quality, rapid voice cloning and realistic speech generation. It targets engineers and researchers seeking efficient TTS solutions capable of exceeding 150x real-time performance, offering SOTA voice cloning and clear 48kHz audio output within a minimal 1GB VRAM footprint.
How It Works
LuxTTS builds upon the ZipVoice architecture, optimizing it through distillation into a 4-step process and incorporating an improved sampling technique. A key differentiator is its custom 48kHz vocoder, which produces clearer speech compared to the typical 24kHz output of many TTS models. This approach yields state-of-the-art voice cloning capabilities comparable to significantly larger models, while achieving remarkable inference speeds.
Quick Start & Requirements
git clone https://github.com/ysharma3501/LuxTTS.git), navigate into the directory (cd LuxTTS), and install dependencies (pip install -r requirements.txt).device='cuda'), and audio processing libraries (e.g., librosa, soundfile from requirements.txt). CUDA is recommended for GPU acceleration.Highlighted Details
Maintenance & Community
The project has released its core model and code, along with a Huggingface Spaces demo. Future roadmap items include support for MPS (Apple Silicon) and the release of code for float16 inference, which is expected to further increase speed. Direct contact is available via email: yatharthsharma350@gmail.com.
Licensing & Compatibility
The project is licensed under the Apache-2.0 license. This permissive license generally allows for commercial use and integration into closed-source projects without significant restrictions.
Limitations & Caveats
Support for MPS devices and optimized float16 inference are not yet implemented. The initial audio encoding step (encode_prompt) has a notable ~10-second initialization delay on first use due to librosa. The ref_duration parameter can be adjusted to balance inference speed against potential artifacts.
1 week ago
Inactive
WhisperSpeech
Vaibhavs10