sesame_csm_openai by phildougherty

OpenAI-compatible TTS API for voice cloning

Created 1 year ago

433 stars

Top 68.7% on SourcePulse

Project Summary

This project provides an OpenAI-compatible Text-to-Speech (TTS) API using the Sesame CSM-1B model, enabling high-quality voice generation and cloning. It targets developers and users of AI chat platforms like OpenWebUI, offering consistent voices and custom voice creation from audio files or YouTube videos.

How It Works

The API leverages the CSM-1B model for speech synthesis, which uses acoustic "seed" samples to maintain voice consistency across requests. It supports multiple audio formats and offers CUDA acceleration for faster generation. Voice cloning is achieved by processing user-provided audio samples or YouTube segments, creating unique voice IDs for subsequent TTS generation.

Quick Start & Requirements

Install/Run: docker compose up -d --build
Prerequisites: Docker, Docker Compose, NVIDIA GPU with CUDA, Hugging Face account with access to sesame/csm-1b.
Setup: Requires Hugging Face token in .env file. First startup downloads models and may take time.
Docs: OpenAI TTS API, Voice Cloning UI

Highlighted Details

OpenAI API compatibility for seamless integration.
Voice cloning from local files or YouTube URLs.
Supports MP3, OPUS, AAC, FLAC, WAV formats.
Multi-GPU support via CSM_DEVICE_MAP environment variable.

Maintenance & Community

MIT License for the API. CSM-1B model subject to Sesame's license.
Not affiliated with Sesame or OpenAI.

Licensing & Compatibility

MIT License for the API.
CSM-1B model license terms apply. Compatible with commercial use, provided Sesame's model license is adhered to.

Limitations & Caveats

Voice cloning quality depends heavily on the input audio quality and clarity. YouTube cloning may yield lower quality with noisy sources or background music. The README notes potential voice drift with long pauses between requests.

sesame_csm_openai by phildougherty

Explore Similar Projects

SpeechGPT-2.0-preview by OpenMOSS

cosyvoice-api by jianchang512

Voice-Clone-Studio by FranckyB

ComfyUI_IndexTTS by billwuhao

Auralis by astramind-ai

Open-VoiceCanvas by ItusiAI

Scriberr by rishikanthc

alltalk_tts by erew123

WhisperSpeech by WhisperSpeech

Kokoro-FastAPI by remsky

Zonos by Zyphra

CosyVoice by FunAudioLLM