qwen-tts-webui by licyk

Advanced Text-to-Speech synthesis with Qwen3 models

Created 5 months ago

297 stars

Top 89.2% on SourcePulse

Project Summary

Summary

This project provides a Gradio-based Web UI and a RESTful API for the Qwen3 Text-to-Speech (TTS) models. It enables users to easily generate speech from text, offering advanced features like custom voice generation, voice design, and voice cloning. The target audience includes developers and researchers looking to integrate or experiment with state-of-the-art TTS capabilities.

How It Works

The system utilizes various Qwen3 TTS models (e.g., 1.7B, 0.6B, Base, CustomVoice, VoiceDesign) to synthesize speech. It offers a user-friendly Gradio interface for direct interaction and a comprehensive API for programmatic control. Key functionalities include generating speech with specific instructions, designing unique vocal characteristics, and cloning voices from reference audio samples.

Quick Start & Requirements

Installation options include a Windows-integrated package, cross-platform installers, or manual setup requiring Python and Git. A cloud-based Colab Notebook is also available for immediate experimentation. Users should be prepared for model downloads upon first use.

Highlighted Details

Supports multiple Qwen3 TTS model variants for diverse speech synthesis needs.
Provides distinct API endpoints for custom-voice, voice-design, and voice-clone functionalities.
API allows querying available models, speakers, languages, and configuration options.
Includes an interrupt endpoint to manage ongoing generation tasks.

Maintenance & Community

The provided README does not detail specific maintenance contributors, community channels (like Discord or Slack), or a public roadmap.

Licensing & Compatibility

The project is licensed under GPL-3.0. This copyleft license may impose restrictions on use within proprietary or closed-source applications.

Limitations & Caveats

Initial model loading requires downloading, which can be time-consuming. The API uses a single-request queue, limiting concurrent processing. Base64 encoding for audio transfer may result in large payloads. Potential for server-side errors like out-of-memory issues exists during operation.

Health Check

Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

26 stars in the last 30 days