Discover and explore top open-source AI tools and projects—updated daily.
OpenAI-compatible TTS API for voice cloning
Top 73.6% on SourcePulse
This project provides an OpenAI-compatible Text-to-Speech (TTS) API using the Sesame CSM-1B model, enabling high-quality voice generation and cloning. It targets developers and users of AI chat platforms like OpenWebUI, offering consistent voices and custom voice creation from audio files or YouTube videos.
How It Works
The API leverages the CSM-1B model for speech synthesis, which uses acoustic "seed" samples to maintain voice consistency across requests. It supports multiple audio formats and offers CUDA acceleration for faster generation. Voice cloning is achieved by processing user-provided audio samples or YouTube segments, creating unique voice IDs for subsequent TTS generation.
Quick Start & Requirements
docker compose up -d --build
sesame/csm-1b
..env
file. First startup downloads models and may take time.Highlighted Details
CSM_DEVICE_MAP
environment variable.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
Voice cloning quality depends heavily on the input audio quality and clarity. YouTube cloning may yield lower quality with noisy sources or background music. The README notes potential voice drift with long pauses between requests.
3 weeks ago
1 day