qwen3-tts-apple-silicon by kapi2800

AI text-to-speech inference for Apple Silicon Macs

Created 5 months ago

533 stars

Top 58.6% on SourcePulse

Project Summary

<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> This project enables local, offline execution of Qwen3-TTS text-to-speech on Apple Silicon Macs (M1/M2/M3/M4). It targets users seeking advanced AI voice generation capabilities without cloud dependencies, offering features like voice cloning and custom voice design with optimized performance.

How It Works

The project leverages the Qwen3-TTS model, specifically optimized for Apple Silicon using the MLX framework. MLX runs natively on Apple's Neural Engine and GPU, enabling significantly reduced RAM usage (2-3 GB vs. 10+ GB) and lower CPU temperatures (40-50°C vs. 80-90°C) compared to standard PyTorch implementations. This approach prioritizes efficiency, performance, and battery life on Mac hardware.

Quick Start & Requirements

Installation: Clone the repository, create and activate a Python virtual environment (python3 -m venv .venv, source .venv/bin/activate), install dependencies (pip install -r requirements.txt), and install ffmpeg via Homebrew (brew install ffmpeg).
Models: Download desired Qwen3-TTS models (Pro or Lite versions for CustomVoice, VoiceDesign, or Base cloning) from provided HuggingFace links and place them in the models/ directory.
Execution: Run python main.py after activating the virtual environment.
Prerequisites: macOS with Apple Silicon (M1/M2/M3/M4), Python 3.10+, Homebrew for ffmpeg.
Resource Needs: Approximately 3GB RAM for Lite models, 6GB RAM for Pro models.
Links: Repository, HuggingFace Model Downloads (implied by README structure).

Highlighted Details

Voice Cloning: Capable of cloning any voice from a 5-second audio sample.
Voice Design: Allows creation of novel voices through text descriptions (e.g., "calm British narrator").
Custom Voices: Includes 9 built-in voices with adjustable emotion and speed controls.
100% Offline: Operates entirely locally without requiring internet connectivity or API keys.
MLX Optimization: Demonstrates substantial improvements in RAM efficiency and thermal performance over standard PyTorch models on Apple Silicon.

Maintenance & Community

The provided README does not detail specific contributors, community channels (like Discord or Slack), roadmaps, or recent maintenance activity.

Licensing & Compatibility

The README does not specify a software license. This omission requires clarification regarding usage rights, redistribution, and commercial compatibility.

Limitations & Caveats

This project is strictly limited to macOS environments equipped with Apple Silicon processors (M1/M2/M3/M4). Users must manually download and manage model files. The absence of a stated license presents a significant caveat for potential adoption, particularly in commercial contexts.

qwen3-tts-apple-silicon by kapi2800

Explore Similar Projects

marvis-tts by Marvis-Labs

QwenVoice by PowerBeef

Kitten-TTS-Server by devnen

Lightning-SimulWhisper by altalt-org

kani-tts by nineninesix-ai

alexandria-audiobook by Finrandojin

kokoro-ios by mlalma

speech-swift by soniqo

faster-qwen3-tts by andimarafioti

mlx-audio by Blaizzy

KittenTTS by KittenML

insanely-fast-whisper by Vaibhavs10