Discover and explore top open-source AI tools and projects—updated daily.
AratakoFlow Matching TTS for expressive speech synthesis
Top 75.6% on SourcePulse
Irodori-TTS is a Flow Matching-based Text-to-Speech (TTS) model that enables high-fidelity speech synthesis with advanced style control. It targets researchers and developers seeking to integrate sophisticated TTS capabilities, offering features like zero-shot voice cloning and emoji-driven style customization, significantly enhancing creative and practical applications of speech generation.
How It Works
The core of Irodori-TTS leverages a Rectified Flow Diffusion Transformer (RF-DiT) operating on continuous latents generated by a DACVAE codec. This approach, inspired by Echo-TTS, allows for high-quality waveform reconstruction from a latent space. The model supports conditioning via text, reference audio for speaker identity, and a novel caption encoder for fine-grained style control, particularly in its "VoiceDesign" variant. This combination provides a flexible and powerful framework for controllable speech synthesis.
Quick Start & Requirements
Installation involves cloning the repository and running uv sync. The project requires PyTorch, which is automatically installed with CUDA 12.8 for Linux/Windows or as a default build for macOS/CPU. Inference can be performed via CLI or a Gradio Web UI. Pre-trained models are available on Hugging Face (e.g., Aratako/Irodori-TTS-500M-v2, Aratako/Irodori-TTS-500M-v2-VoiceDesign), with hosted demos also provided at Aratako/Irodori-TTS-500M-v2-Demo and Aratako/Irodori-TTS-500M-v2-VoiceDesign-Demo respectively.
Highlighted Details
Maintenance & Community
The project is maintained by Chihiro Arata, with the primary development hosted on GitHub. Specific community channels (e.g., Discord, Slack) or a public roadmap are not detailed in the provided README.
Licensing & Compatibility
The project's code is released under the MIT License, which is permissive for commercial use. However, the licensing for the pre-trained model weights requires users to consult separate model cards on Hugging Face, as these may have different terms.
Limitations & Caveats
The v1 and v2 codebases and their corresponding checkpoints are not compatible, necessitating careful selection of the correct version. The specific licenses for model weights are not detailed within the README itself and must be verified independently for each model.
1 week ago
Inactive
fixie-ai
metavoiceio