qwen3-tts-apple-silicon  by kapi2800

AI text-to-speech inference for Apple Silicon Macs

Created 4 weeks ago

New!

295 stars

Top 89.9% on SourcePulse

GitHubView on GitHub
Project Summary

<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> This project enables local, offline execution of Qwen3-TTS text-to-speech on Apple Silicon Macs (M1/M2/M3/M4). It targets users seeking advanced AI voice generation capabilities without cloud dependencies, offering features like voice cloning and custom voice design with optimized performance.

How It Works

The project leverages the Qwen3-TTS model, specifically optimized for Apple Silicon using the MLX framework. MLX runs natively on Apple's Neural Engine and GPU, enabling significantly reduced RAM usage (2-3 GB vs. 10+ GB) and lower CPU temperatures (40-50°C vs. 80-90°C) compared to standard PyTorch implementations. This approach prioritizes efficiency, performance, and battery life on Mac hardware.

Quick Start & Requirements

  • Installation: Clone the repository, create and activate a Python virtual environment (python3 -m venv .venv, source .venv/bin/activate), install dependencies (pip install -r requirements.txt), and install ffmpeg via Homebrew (brew install ffmpeg).
  • Models: Download desired Qwen3-TTS models (Pro or Lite versions for CustomVoice, VoiceDesign, or Base cloning) from provided HuggingFace links and place them in the models/ directory.
  • Execution: Run python main.py after activating the virtual environment.
  • Prerequisites: macOS with Apple Silicon (M1/M2/M3/M4), Python 3.10+, Homebrew for ffmpeg.
  • Resource Needs: Approximately 3GB RAM for Lite models, 6GB RAM for Pro models.
  • Links: Repository, HuggingFace Model Downloads (implied by README structure).

Highlighted Details

  • Voice Cloning: Capable of cloning any voice from a 5-second audio sample.
  • Voice Design: Allows creation of novel voices through text descriptions (e.g., "calm British narrator").
  • Custom Voices: Includes 9 built-in voices with adjustable emotion and speed controls.
  • 100% Offline: Operates entirely locally without requiring internet connectivity or API keys.
  • MLX Optimization: Demonstrates substantial improvements in RAM efficiency and thermal performance over standard PyTorch models on Apple Silicon.

Maintenance & Community

The provided README does not detail specific contributors, community channels (like Discord or Slack), roadmaps, or recent maintenance activity.

Licensing & Compatibility

The README does not specify a software license. This omission requires clarification regarding usage rights, redistribution, and commercial compatibility.

Limitations & Caveats

This project is strictly limited to macOS environments equipped with Apple Silicon processors (M1/M2/M3/M4). Users must manually download and manage model files. The absence of a stated license presents a significant caveat for potential adoption, particularly in commercial contexts.

Health Check
Last Commit

4 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
2
Star History
296 stars in the last 28 days

Explore Similar Projects

Feedback? Help us improve.