mlx-audio by Blaizzy

TTS/STT/STS library for efficient speech analysis on Apple Silicon

Created 1 year ago

3,223 stars

Top 14.8% on SourcePulse

View on GitHub

4 Experts Love This Project

Jonathan Ragan-Kelley

Professor at MIT

Junyang Lin

Core Maintainer at Alibaba Qwen

Dan Guido

Cofounder of Trail of Bits

Awni Hannun

Author of MLX; Research Scientist at Apple

Project Summary

This library provides text-to-speech (TTS) and speech-to-speech (STS) capabilities leveraging Apple's MLX framework for efficient inference on Apple Silicon. It targets developers and users seeking fast, customizable speech synthesis with features like multiple voices, adjustable speed, and an interactive web interface with 3D audio visualization.

How It Works

MLX-Audio utilizes the MLX framework for accelerated computation on Apple Silicon, enabling fast inference for its TTS and STS models. It supports the Kokoro model architecture for multilingual TTS and the CSM model for voice cloning via reference audio. The library offers direct Python API access, a CLI for quick generation, and a FastAPI-based web server with a 3D audio visualizer.

Quick Start & Requirements

Install: pip install mlx-audio
Web interface/API dependencies: pip install -r requirements.txt
Requirements: MLX, Python 3.8+, Apple Silicon Mac (recommended). Japanese/Mandarin support requires misaki[ja] or misaki[zh].
Quick Start: https://github.com/Blaizzy/mlx-audio

Highlighted Details

Fast inference on Apple Silicon (M series chips).
Supports multiple voices (AF Heart, AF Nova, AF Bella, BF Emma) and languages (American English, British English, Japanese, Mandarin Chinese).
Interactive web interface with real-time 3D audio visualization and direct output folder access.
REST API for TTS generation and audio playback.
Voice cloning capability using the CSM model with reference audio.
Quantization support for optimized performance.

Maintenance & Community

Project maintained by Blaizzy.
Uses Kokoro model architecture and Three.js for visualization.
No explicit community links (Discord/Slack) or roadmap mentioned in the README.

Licensing & Compatibility

License: MIT License.
Compatible with commercial use and closed-source linking due to the permissive MIT license.

Limitations & Caveats

The README indicates that Japanese and Mandarin Chinese language support require additional misaki package installations. The "open output folder" feature for the web interface is noted to work only when running the server locally.

Health Check

Last Commit

3 days ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

212 stars in the last 30 days