Discover and explore top open-source AI tools and projects—updated daily.
senstellaSpeech model for Apple Silicon using MLX
Top 75.0% on SourcePulse
This project provides an implementation of the Conversation Speech Model (CSM) for Apple Silicon, leveraging the MLX framework. It enables text-to-speech generation with conversational context and offers a command-line interface (CLI) for easy usage and fine-tuning.
How It Works
The implementation utilizes the MLX framework, optimized for Apple Silicon hardware. It supports loading pre-trained weights from Hugging Face and allows for quantization to improve inference speed, potentially enabling near real-time generation. The model can incorporate conversational context through previous audio segments and text transcripts to produce more natural-sounding speech.
Quick Start & Requirements
pip install git+https://github.com/senstella/csm-mlx --upgrade or uv add git+https://github.com/senstella/csm-mlx --upgrade.sentencepiece compiler errors.audiofile, audresample.senstella/csm-1b-mlx).uv tool install "git+https://github.com/senstella/csm-mlx[cli]" --upgrade or pipx install "git+https://github.com/senstella/csm-mlx[cli]" --upgrade.Highlighted Details
Maintenance & Community
The project acknowledges contributions from Sesame, Moshi, torchtune, MLX, typer, audiofile, and audresample. There is no explicit mention of community channels like Discord or Slack.
Licensing & Compatibility
Limitations & Caveats
The project notes that quantization, while speeding up inference, may lead to a loss in audio quality. There is an open TODO item to implement watermarking and further optimize performance for real-time inference. A known issue exists with sentencepiece compilation on Python versions >= 3.13.
2 months ago
1 day
WhisperSpeech
lucidrains