Discover and explore top open-source AI tools and projects—updated daily.
AratakoLLM-powered multilingual TTS with voice cloning
New!
Top 99.8% on SourcePulse
<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> Aratako/T5Gemma-TTS is a multilingual Text-to-Speech (TTS) system built on the T5Gemma encoder-decoder LLM architecture. It addresses the need for flexible, high-quality speech synthesis with advanced features like zero-shot voice cloning and explicit duration control. Targeted at researchers and power users, it offers a powerful tool for generating diverse and controllable audio outputs.
How It Works
The system leverages the T5Gemma LLM architecture for text-to-speech conversion, supporting English, Chinese, and Japanese. Key functionalities include zero-shot voice cloning from reference audio and explicit duration control for generated speech length. This approach combines LLM linguistic understanding with robust audio generation techniques.
Quick Start & Requirements
Installation requires cloning the repo and running pip install -r requirements.txt. GPU support necessitates PyTorch with CUDA (e.g., pip install "torch<=2.8.0" torchaudio --index-url https://download.pytorch.org/whl/cu128); Apple Silicon (MPS) is also supported. Quantized models (8-bit/4-bit encoder) reduce VRAM needs (4-bit model ~7.6 GB). Inference is available via command-line scripts or a Gradio web UI.
Highlighted Details
Maintenance & Community
No specific details regarding maintainers, community channels, or active development signals were found in the provided README.
Licensing & Compatibility
Code is MIT licensed, generally permitting commercial use. Model weight licensing is detailed separately in the model card.
Limitations & Caveats
Inference is not real-time due to autoregressive generation. Duration control is approximate, and speech pacing/naturalness may vary. Audio quality depends on training data, potentially underperforming for underrepresented voices. Native Windows inference can be unstable; WSL2 or Docker is recommended.
2 weeks ago
Inactive