QwenVoice  by PowerBeef

Offline, native TTS generation for Apple platforms

Created 3 months ago
276 stars

Top 93.8% on SourcePulse

GitHubView on GitHub
Project Summary

Vocello addresses the need for a high-performance, offline Text-to-Speech (TTS) solution on Apple platforms, specifically targeting Apple Silicon hardware. It offers advanced features like Custom Voice, Voice Design, and Voice Cloning, enabling users to generate and manipulate speech locally with native performance. The target audience includes developers and power users on macOS and iOS seeking advanced, privacy-focused TTS capabilities.

How It Works

The project employs a native Swift/MLX shared core optimized for Apple Silicon, featuring macOS XPC and iOS extension isolation for robust performance and stability. It supports both 8-bit (quality) and 4-bit (speed) model variants, allowing adaptation to hardware constraints. Voice generation is driven by natural language instructions for tone and emotion, bypassing traditional SSML markup. The architecture is native Apple-platform focused, with no Python backend or CLI surface.

Quick Start & Requirements

  • Primary install / run command: Download the latest release artifact (e.g., Vocello-macos26.dmg for the next macOS release) from GitHub Releases. Install by dragging the .app to /Applications. Models are downloaded within the application via the "Models" tab.
  • Non-default prerequisites and dependencies: macOS 26.0+, Apple Silicon chip, Xcode 26.0, and XcodeGen are required. iOS targets need iOS 26.0+. Minimum 8 GB RAM on macOS; iPhone 15 Pro is the minimum target for iOS.
  • Links: GitHub Releases

Highlighted Details

  • Features advanced voice manipulation: Custom Voice, Voice Design, and Voice Cloning from short audio clips.
  • Supports native model downloads directly from Hugging Face repositories (e.g., mlx-community/Qwen3-TTS-12Hz-1.7B-CustomVoice-*).
  • Provides live streaming previews for single-generation speech synthesis.
  • Employs macOS XPC process isolation for robust native generation.
  • Voice tone and emotion are controlled via natural language instructions rather than SSML.

Maintenance & Community

The project is actively developed, with the next macOS release targeting the "Vocello" app name. The iPhone track is in development but deferred from public release. No specific community links (Discord/Slack) or roadmap details are provided in the README.

Licensing & Compatibility

Licensed under the permissive MIT License, allowing for commercial use and integration into closed-source projects without significant restrictions.

Limitations & Caveats

The application does not expose temperature or max-token controls and lacks a streaming batch UI, with batch generation being sequential. A significant adoption barrier is the requirement for bleeding-edge OS and development tool versions (macOS 26.0+, Xcode 26.0+). The iPhone track is currently deferred from public release. The project is exclusively for Apple Silicon hardware.

Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
13
Issues (30d)
8
Star History
60 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.