vits-simple-api by Artrajz

HTTP API for VITS-based text-to-speech and voice conversion

Created 2 years ago

1,031 stars

Top 36.3% on SourcePulse

Project Summary

This project provides a simple HTTP API for VITS (Variational Inference with adversarial learning for Text-to-Speech) and related TTS models, extending the MoeGoe framework. It targets developers and researchers needing to integrate advanced text-to-speech and voice conversion capabilities into applications, offering support for multiple VITS variants and automatic language detection.

How It Works

The API acts as a wrapper around various VITS-based models, including HuBert-soft VITS, vits_chinese, Bert-VITS2, W2V2 VITS, and GPT-SoVITS. It exposes endpoints for text-to-speech synthesis and voice conversion, allowing users to specify models, languages, and synthesis parameters via API calls. The architecture supports loading multiple models concurrently and offers GPU acceleration for inference.

Quick Start & Requirements

Docker: bash -c "$(wget -O- https://raw.githubusercontent.com/Artrajz/vits-simple-api/main/vits-simple-api-installer-latest.sh)" followed by docker-compose up -d.
Virtual Environment: git clone the repository, pip install -r requirements.txt (Python 3.10 recommended), then python app.py.
Windows: Download and extract the release package, then run start.bat.
Prerequisites: GPU with CUDA support is recommended for accelerated inference. Model files (.pth, .json, .ckpt) must be downloaded and placed in the data/models directory.
Docs: https://artrajz-vits-simple-api.hf.space/

Highlighted Details

Supports VITS, HuBert-soft VITS, vits_chinese, Bert-VITS2, W2V2 VITS, and GPT-SoVITS.
Automatic language recognition and processing with custom language scope support.
Features long text batch processing and SSML support (in progress).
Offers both a web UI and an admin backend for model management.

Maintenance & Community

The project is actively maintained, with contributions from various individuals. Community support is available via a Chinese QQ group.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project relies on external model files which must be manually downloaded and configured. Some advanced features like SSML are still in progress. The README mentions potential issues with non-English or special character paths on Windows.

vits-simple-api by Artrajz

Explore Similar Projects

speech-recognition-uk by egorsmkv

assem-vc by maum-ai

voicebox-pytorch by lucidrains

Dia-TTS-Server by devnen

Easy-Voice-Toolkit by Spr-Aachen

whisper-plus by kadirnar

xtts-webui by daswer123

alltalk_tts by erew123

easyVoice by cosin2077

parler-tts by huggingface

Spark-TTS by SparkAudio

bark by suno-ai