MimikaStudio  by BoltzmannEntropy

Local-first voice cloning and TTS for macOS

Created 1 month ago
334 stars

Top 82.6% on SourcePulse

GitHubView on GitHub
Project Summary

MimikaStudio is a local-first macOS application designed for advanced voice cloning and text-to-speech (TTS) tasks, optimized for Apple Silicon with native Metal acceleration via MLX. It targets engineers, researchers, and power users seeking robust, on-device AI audio processing capabilities, offering features like near-instant voice cloning, document-to-audiobook creation, and an integrated agentic MCP server for local automation.

How It Works

The application integrates multiple state-of-the-art TTS and voice cloning models, including Kokoro for fast, high-quality English synthesis and Qwen3-TTS and Chatterbox for voice cloning from short audio samples. It operates fully on-device, managing model downloads and providing a unified job queue for TTS, cloning, and audiobook pipelines. An agentic MCP server exposes over 60 REST API endpoints, enabling programmatic access and local automation workflows.

Quick Start & Requirements

The recommended installation is via the install.sh script in the project root, which handles prerequisites (Homebrew, Python 3, espeak-ng, ffmpeg), Python virtual environment setup, dependency installation, and Flutter configuration.

  • Primary Install: bash\ngit clone https://github.com/BoltzmannEntropy/MimikaStudio.git\ncd MimikaStudio\n./install.sh
  • Prerequisites: macOS 13+ (Ventura or later), Apple Silicon (M1/M2/M3/M4), 8GB+ RAM (16GB+ recommended), 5-10GB storage for models, Python 3.10+, Flutter 3.x with desktop support.
  • Setup Time: The initial install.sh run can take several minutes due to dependency installation and model downloads.
  • Links: Project GitHub

Highlighted Details

  • Voice Cloning: Clone any voice from as little as 3 seconds of reference audio using Qwen3-TTS (10 languages) or Chatterbox (23 languages).
  • Document Processing: Read PDF, DOCX, EPUB, Markdown, and TXT files aloud with sentence-level highlighting, and create audiobooks (M4B, MP3, WAV) from documents using Kokoro voices.
  • Local-First & On-Device: All processing occurs locally, with built-in model download management.
  • Agentic MCP Server: Provides programmatic access to all features via 60+ REST API endpoints and MCP tools for advanced local automation.
  • Apple Silicon Optimization: Native Metal acceleration via MLX for improved performance on supported hardware.

Maintenance & Community

The README does not explicitly list community channels (e.g., Discord, Slack) or notable contributors beyond the author.

Licensing & Compatibility

Source code is licensed under Business Source License 1.1 (BSL-1.1), permitting personal/internal use and converting to GPL-2.0-or-later after a specified date. Binary distributions are under a separate MimikaStudio Binary Distribution License, which requires a commercial license for commercial use.

Limitations & Caveats

Currently, only macOS binaries are provided; Windows and Linux support are planned for future releases. The unsigned DMG requires manual user intervention to bypass macOS Gatekeeper security warnings on first launch.

Health Check
Last Commit

4 days ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
5
Star History
285 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.