Discover and explore top open-source AI tools and projects—updated daily.
pnnbao97Vietnamese TTS with instant voice cloning
Top 62.0% on SourcePulse
Summary
VieNeu-TTS provides an advanced, on-device Vietnamese Text-to-Speech (TTS) system featuring instant voice cloning. It targets developers and users needing high-quality, real-time, offline speech synthesis with speaker consistency and code-switching capabilities. The system offers production-ready audio generation directly on CPU or GPU, significantly enhancing Vietnamese TTS accessibility.
How It Works
The model employs a Qwen 0.5B LLM backbone and the NeuCodec audio codec, processing inputs within a 2048-token context. Its architecture is optimized for real-time 24kHz waveform generation. The project offers multiple formats, including PyTorch for maximum quality, and GGUF (Q4/Q8) variants specifically optimized for fast CPU inference and streaming, alongside ONNX for codec compatibility.
Quick Start & Requirements
Clone the repository and install dependencies using uv sync. Key requirements include Python 3.12+ and eSpeak NG for phonemization. Optional GPU acceleration requires llama-cpp-python with CUDA support, and LMDeploy optimizations can be installed for enhanced GPU performance. Detailed setup guides, including a video tutorial, are available.
https://github.com/pnnbao97/VieNeu-TTShttps://huggingface.co/pnnbao-ump/VieNeu-TTSHighlighted Details
Maintenance & Community
Developed by Phạm Nguyễn Ngọc Bảo, building upon NeuTTS Air. Community support is available via GitHub Issues and Hugging Face.
Licensing & Compatibility
Released under the permissive Apache License 2.0, suitable for commercial use and integration into closed-source projects.
Limitations & Caveats
A Dockerized setup and fine-tuning code are planned but not yet released. GGUF models currently support only four specific reference voices. Streaming inference on GPU is also a future development goal.
15 hours ago
Inactive