f5-tts-mlx by lucasnewman

Text-to-speech implementation using MLX framework

Created 1 year ago

603 stars

Top 54.3% on SourcePulse

View on GitHub

1 Expert Loves This Project

Georgios Konstantopoulos

CTO, General Partner at Paradigm

Project Summary

This repository provides an MLX implementation of F5-TTS, a non-autoregressive, zero-shot text-to-speech system. It targets users seeking high-quality, fast speech synthesis with voice cloning capabilities, leveraging a flow-matching mel spectrogram generator and a diffusion transformer (DiT).

How It Works

F5-TTS utilizes a flow-matching approach for generating mel spectrograms, combined with a Diffusion Transformer (DiT) for synthesis. It builds upon the E2 TTS architecture, incorporating ConvNeXT v2 blocks to enhance learned text alignment, aiming for improved performance and fidelity. The zero-shot capability allows for voice cloning using reference audio samples.

Quick Start & Requirements

Install via pip: pip install f5-tts-mlx
Requires macOS with Apple Silicon (MLX framework dependency).
Basic usage: python -m f5_tts_mlx.generate --text "..."
Voice matching requires a mono, 24kHz WAV file (5-10 seconds).
Quantized models (4-bit, 8-bit) are available via the --q flag.
Pretrained model weights are available on Hugging Face.

Highlighted Details

Generates speech in approximately 4 seconds on an M3 Max MacBook Pro.
Supports zero-shot voice cloning with reference audio.
Offers quantized models for reduced memory and bandwidth usage.
Can be piped with other MLX models, e.g., language models.

Maintenance & Community

This project is based on original implementations by Yushen Chen (F5 TTS) and Phil Wang (E2 TTS). Further community or maintenance details are not specified in the README.

Licensing & Compatibility

Released under the MIT license.
Permissive license suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

The MLX framework is specific to Apple Silicon hardware, limiting its use to macOS users. The project is an implementation of existing research, and its long-term maintenance status is not detailed.

Health Check

Last Commit

9 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days