Dia-TTS-Server by devnen

Self-host a powerful TTS model with an OpenAI-compatible API

Created 8 months ago

341 stars

Top 81.0% on SourcePulse

Project Summary

This project provides a self-hostable server for the Dia TTS model, offering a user-friendly web UI and an OpenAI-compatible API for easy integration. It targets developers and power users needing advanced text-to-speech capabilities, including voice cloning and realistic dialogue generation, with significant improvements in speed and VRAM usage.

How It Works

The server leverages the FastAPI framework to expose Dia TTS functionalities. It intelligently chunks long text inputs for sequential processing and concatenation, improving handling of large documents. The project defaults to BF16 SafeTensors for reduced VRAM and faster inference, with support for original .pth weights. It automatically detects and utilizes NVIDIA CUDA for GPU acceleration, with a CPU fallback.

Quick Start & Requirements

Installation: Clone the repository, set up a Python virtual environment, and install dependencies via pip install -r requirements.txt. For GPU acceleration, ensure correct PyTorch with CUDA support is installed.
Prerequisites: Python 3.10+, Git, NVIDIA GPU (recommended for performance), CUDA Toolkit (if using GPU), libsndfile1 and ffmpeg (on Linux).
Docker: Pre-built images are available on GHCR. docker compose up -d provides a one-command setup.
Resources: Initial model downloads can be substantial (3-7GB). VRAM usage is approximately 7GB with BF16 SafeTensors.
Docs: https://github.com/devnen/dia-tts-server

Highlighted Details

OpenAI-compatible API endpoint (/v1/audio/speech).
Supports 43 built-in voices and improved voice cloning with automatic audio/transcript handling.
Intelligent large text chunking with configurable size and UI toggle.
Generation seed for reproducible results across chunks or requests.
Automatic audio post-processing for silence trimming and artifact removal.

Maintenance & Community

The project is actively maintained by devnen. Contributions are welcome via issues and pull requests.

Licensing & Compatibility

Licensed under the MIT License, permitting commercial use and integration with closed-source applications.

Limitations & Caveats

Whisper integration for transcript generation during voice cloning is experimental. Voice consistency across chunks in "Random/Dialogue" mode without a fixed seed may vary. The "UI Cancel" button stops frontend waiting but does not immediately halt backend inference.

Dia-TTS-Server by devnen

Explore Similar Projects

S.A.T.U.R.D.A.Y by GRVYDEV

Auralis by astramind-ai

Kokoros by lucasjinreal

dia2 by nari-labs

sesame_csm_openai by phildougherty

intrascribe by weynechen

openedai-speech by matatonic

vits-simple-api by Artrajz

Chatterbox-TTS-Server by devnen

WhisperSpeech by WhisperSpeech

alltalk_tts by erew123

tortoise-tts by neonbjb