qwen3-TTS-studio by bc-dunia

Advanced TTS studio for voice cloning and AI-driven content creation

Created 5 months ago

284 stars

Top 91.8% on SourcePulse

Project Summary

Summary

Qwen3-TTS Studio offers a professional, user-friendly interface for the Qwen3-TTS model, targeting users needing fine-grained speech synthesis control and automated podcast generation. It simplifies complex TTS parameter tuning and provides an end-to-end workflow from script to audio, unlocking the model's full potential.

How It Works

This Gradio application wraps Qwen3-TTS, providing an intuitive UI for advanced parameter control (temperature, top-k, top-p) via presets. Its core innovation is an automated podcast pipeline integrating LLMs for scriptwriting, multi-speaker support, and audio synthesis. It also features instant voice cloning from audio samples and natural language voice design.

Quick Start & Requirements

Requires Python 3.12+, macOS (MPS)/Linux (CUDA), 16GB+ RAM. Clone repo, pip install -r requirements.txt. Download Qwen3-TTS models (HuggingFace/ModelScope). Podcast generation needs LLM API keys (OpenAI, OpenRouter, Claude) or local Ollama. Run via python qwen_tts_ui.py. Docker is supported, but may limit macOS acceleration.

Highlighted Details

Voice Cloning: Multi-sample cloning with automatic quality analysis and weighted embedding.
Podcast Automation: Generates podcasts from topics, including AI scriptwriting (multiple LLMs) and multi-speaker assignment.
Voice Design: Custom voice creation via natural language descriptions.
Cross-lingual Support: Toggle for cross-lingual pronunciation in mixed-language content.
Parameter Presets: Real-time TTS generation presets (Fast, Balanced, Quality).

Maintenance & Community

Builds on Alibaba's Qwen3-TTS model. The README provides no details on specific maintainers, community channels, or sponsorships.

Licensing & Compatibility

Usage terms are governed by the underlying Qwen3-TTS model license. No explicit notes on commercial use or closed-source linking are provided; users must consult the Qwen3-TTS license directly.

Limitations & Caveats

Docker on macOS lacks MPS acceleration (CPU-only). Models require runtime mounting; ensure non-root container permissions. Podcast generation is optional, dependent on LLM provider configuration.

Health Check

Last Commit

3 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days