Discover and explore top open-source AI tools and projects—updated daily.
bc-duniaAdvanced TTS studio for voice cloning and AI-driven content creation
Top 95.9% on SourcePulse
Summary
Qwen3-TTS Studio offers a professional, user-friendly interface for the Qwen3-TTS model, targeting users needing fine-grained speech synthesis control and automated podcast generation. It simplifies complex TTS parameter tuning and provides an end-to-end workflow from script to audio, unlocking the model's full potential.
How It Works
This Gradio application wraps Qwen3-TTS, providing an intuitive UI for advanced parameter control (temperature, top-k, top-p) via presets. Its core innovation is an automated podcast pipeline integrating LLMs for scriptwriting, multi-speaker support, and audio synthesis. It also features instant voice cloning from audio samples and natural language voice design.
Quick Start & Requirements
Requires Python 3.12+, macOS (MPS)/Linux (CUDA), 16GB+ RAM. Clone repo, pip install -r requirements.txt. Download Qwen3-TTS models (HuggingFace/ModelScope). Podcast generation needs LLM API keys (OpenAI, OpenRouter, Claude) or local Ollama. Run via python qwen_tts_ui.py. Docker is supported, but may limit macOS acceleration.
Highlighted Details
Maintenance & Community
Builds on Alibaba's Qwen3-TTS model. The README provides no details on specific maintainers, community channels, or sponsorships.
Licensing & Compatibility
Usage terms are governed by the underlying Qwen3-TTS model license. No explicit notes on commercial use or closed-source linking are provided; users must consult the Qwen3-TTS license directly.
Limitations & Caveats
Docker on macOS lacks MPS acceleration (CPU-only). Models require runtime mounting; ensure non-root container permissions. Podcast generation is optional, dependent on LLM provider configuration.
1 week ago
Inactive