Discover and explore top open-source AI tools and projects—updated daily.
Self-host a powerful TTS model with an OpenAI-compatible API
Top 84.1% on SourcePulse
This project provides a self-hostable server for the Dia TTS model, offering a user-friendly web UI and an OpenAI-compatible API for easy integration. It targets developers and power users needing advanced text-to-speech capabilities, including voice cloning and realistic dialogue generation, with significant improvements in speed and VRAM usage.
How It Works
The server leverages the FastAPI framework to expose Dia TTS functionalities. It intelligently chunks long text inputs for sequential processing and concatenation, improving handling of large documents. The project defaults to BF16 SafeTensors for reduced VRAM and faster inference, with support for original .pth weights. It automatically detects and utilizes NVIDIA CUDA for GPU acceleration, with a CPU fallback.
Quick Start & Requirements
pip install -r requirements.txt
. For GPU acceleration, ensure correct PyTorch with CUDA support is installed.libsndfile1
and ffmpeg
(on Linux).docker compose up -d
provides a one-command setup.Highlighted Details
/v1/audio/speech
).Maintenance & Community
The project is actively maintained by devnen. Contributions are welcome via issues and pull requests.
Licensing & Compatibility
Licensed under the MIT License, permitting commercial use and integration with closed-source applications.
Limitations & Caveats
Whisper integration for transcript generation during voice cloning is experimental. Voice consistency across chunks in "Random/Dialogue" mode without a fixed seed may vary. The "UI Cancel" button stops frontend waiting but does not immediately halt backend inference.
3 months ago
Inactive