Discover and explore top open-source AI tools and projects—updated daily.
tronghieuitVietnamese TTS and voice cloning
Top 92.9% on SourcePulse
Valtec Vietnamese TTS offers an ultra-lightweight, CPU-only solution for text-to-speech and zero-shot voice cloning, targeting engineers and power users. It enables high-quality voice synthesis and cloning without GPU requirements, achieving speeds several times faster than real-time.
How It Works
The system employs a lightweight architecture with minimal parameters (~74.8M for zero-shot) and CPU-native design. It uses speaker and style encoders to capture voice identity and prosody from short audio samples (3-10s), achieving an impressive Real-Time Factor (RTF) below 0.3 on standard processors. This approach democratizes advanced TTS by removing hardware barriers and includes prosody transfer capabilities.
Quick Start & Requirements
Installation is via pip: pip install git+https://github.com/tronghieuit/valtec-tts.git. Requirements include Python 3.8+ and PyTorch 2.0+. CUDA is optional for multi-speaker TTS acceleration; core functionality runs on CPU. Linux is recommended for optimal phonemization. Models auto-download from Hugging Face.
Highlighted Details
Maintenance & Community
Developed by the ValtecAI Team. Specific community channels, notable contributors, sponsorships, or partnerships are not detailed in the README.
Licensing & Compatibility
Licensed under CC BY-NC 4.0 (Creative Commons Attribution-NonCommercial 4.0 International). This license strictly prohibits commercial use without explicit written permission, limiting applications to non-commercial projects and research.
Limitations & Caveats
Model optimized for Vietnamese; other languages may have lower quality. Cloned voice fidelity depends on reference audio quality. Highly unique voices might not be perfectly replicated. Not yet optimized for real-time streaming.
2 weeks ago
Inactive
babysor
RVC-Boss