Discover and explore top open-source AI tools and projects—updated daily.
QwenLMPowerful speech generation models for diverse applications
New!
Top 9.1% on SourcePulse
<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> Qwen3-TTS provides a powerful, open-source suite of Text-to-Speech models from Alibaba Cloud, enabling stable, expressive, and streaming speech generation. It targets developers and researchers seeking advanced capabilities like free-form voice design, vivid voice cloning, and natural language-based voice control across multiple languages, offering a comprehensive solution for high-fidelity speech synthesis.
How It Works
The system leverages a proprietary Qwen3-TTS-Tokenizer-12Hz for efficient acoustic compression and semantic modeling. Its core is a discrete multi-codebook LM architecture, offering an end-to-end approach that bypasses traditional bottlenecks. A key innovation is the Dual-Track hybrid streaming architecture, enabling ultra-low-latency (97ms) real-time generation. Natural language instructions are integrated for fine-grained control over timbre, emotion, and prosody, adapting dynamically to text semantics.
Quick Start & Requirements
Installation is straightforward via pip install -U qwen-tts. For development, clone the repository and install in editable mode. Python 3.12 is recommended. GPU acceleration is essential for performance, with device_map="cuda:0" and torch.bfloat16 or torch.float16 usage. FlashAttention 2 is recommended for reduced GPU memory, requiring compatible hardware. Links to Hugging Face, ModelScope, Discord, and vLLM-Omni are provided.
Highlighted Details
Maintenance & Community
Developed by the Qwen team at Alibaba Cloud. Community support is available via a linked Discord channel and WeChat.
Licensing & Compatibility
The specific open-source license is not detailed within the provided README text. Compatibility for commercial use or closed-source linking would depend on the unstated license terms.
Limitations & Caveats
FlashAttention 2 requires specific hardware and data types. vLLM-Omni currently supports offline inference, with online serving planned. The web UI demo for the Base model requires HTTPS for microphone access in modern browsers.
3 days ago
Inactive
canopyai