Discover and explore top open-source AI tools and projects—updated daily.
nineninesix-aiFast, high-quality text-to-speech generation
Top 74.8% on SourcePulse
A fast, modular, and human-like text-to-speech (TTS) system, Kani TTS generates high-quality speech from text. It targets developers and researchers seeking flexible TTS solutions, offering multilingual support and optimized inference across diverse hardware, including NVIDIA GPUs and Apple Silicon.
How It Works
Kani TTS employs a modular architecture with various pre-trained models supporting multiple languages and sizes. It utilizes the NVIDIA NeMo NanoCodec for efficient audio compression and decompression, enabling rapid inference. The system provides specialized inference pipelines: vLLM for high-performance NVIDIA GPU acceleration with an OpenAI-compatible API, and MLX for optimized performance on Apple Silicon leveraging its unified memory and Neural Engine.
Quick Start & Requirements
pip install kani-tts.examples/ directory for getting started.Highlighted Details
Maintenance & Community
Community contributions are actively encouraged via a Discord server. Development focuses on enhancing the core architecture with specialized LLMs for TTS, expanding language and speaker support, improving audio codecs, and building diverse datasets.
Licensing & Compatibility
Licensed under the Apache 2.0 license, permitting commercial use and modification with attribution.
Limitations & Caveats
Performance may degrade with input text exceeding 1000 tokens. Limited emotional expressivity is noted unless models are fine-tuned on specific datasets.
2 months ago
Inactive
lucidrains
fixie-ai