MimikaStudio by BoltzmannEntropy

Local-first voice cloning and TTS for macOS

Created 4 months ago

585 stars

Top 55.0% on SourcePulse

Project Summary

MimikaStudio is a local-first macOS application designed for advanced voice cloning and text-to-speech (TTS) tasks, optimized for Apple Silicon with native Metal acceleration via MLX. It targets engineers, researchers, and power users seeking robust, on-device AI audio processing capabilities, offering features like near-instant voice cloning, document-to-audiobook creation, and an integrated agentic MCP server for local automation.

How It Works

The application integrates multiple state-of-the-art TTS and voice cloning models, including Kokoro for fast, high-quality English synthesis and Qwen3-TTS and Chatterbox for voice cloning from short audio samples. It operates fully on-device, managing model downloads and providing a unified job queue for TTS, cloning, and audiobook pipelines. An agentic MCP server exposes over 60 REST API endpoints, enabling programmatic access and local automation workflows.

Quick Start & Requirements

The recommended installation is via the install.sh script in the project root, which handles prerequisites (Homebrew, Python 3, espeak-ng, ffmpeg), Python virtual environment setup, dependency installation, and Flutter configuration.

Primary Install: bash\ngit clone https://github.com/BoltzmannEntropy/MimikaStudio.git\ncd MimikaStudio\n./install.sh
Prerequisites: macOS 13+ (Ventura or later), Apple Silicon (M1/M2/M3/M4), 8GB+ RAM (16GB+ recommended), 5-10GB storage for models, Python 3.10+, Flutter 3.x with desktop support.
Setup Time: The initial install.sh run can take several minutes due to dependency installation and model downloads.
Links: Project GitHub

Highlighted Details

Voice Cloning: Clone any voice from as little as 3 seconds of reference audio using Qwen3-TTS (10 languages) or Chatterbox (23 languages).
Document Processing: Read PDF, DOCX, EPUB, Markdown, and TXT files aloud with sentence-level highlighting, and create audiobooks (M4B, MP3, WAV) from documents using Kokoro voices.
Local-First & On-Device: All processing occurs locally, with built-in model download management.
Agentic MCP Server: Provides programmatic access to all features via 60+ REST API endpoints and MCP tools for advanced local automation.
Apple Silicon Optimization: Native Metal acceleration via MLX for improved performance on supported hardware.

Maintenance & Community

The README does not explicitly list community channels (e.g., Discord, Slack) or notable contributors beyond the author.

Licensing & Compatibility

Source code is licensed under Business Source License 1.1 (BSL-1.1), permitting personal/internal use and converting to GPL-2.0-or-later after a specified date. Binary distributions are under a separate MimikaStudio Binary Distribution License, which requires a commercial license for commercial use.

Limitations & Caveats

Currently, only macOS binaries are provided; Windows and Linux support are planned for future releases. The unsigned DMG requires manual user intervention to bypass macOS Gatekeeper security warnings on first launch.

Health Check

Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

27 stars in the last 30 days