HTTP API for VITS-based text-to-speech and voice conversion
Top 37.9% on sourcepulse
This project provides a simple HTTP API for VITS (Variational Inference with adversarial learning for Text-to-Speech) and related TTS models, extending the MoeGoe framework. It targets developers and researchers needing to integrate advanced text-to-speech and voice conversion capabilities into applications, offering support for multiple VITS variants and automatic language detection.
How It Works
The API acts as a wrapper around various VITS-based models, including HuBert-soft VITS, vits_chinese, Bert-VITS2, W2V2 VITS, and GPT-SoVITS. It exposes endpoints for text-to-speech synthesis and voice conversion, allowing users to specify models, languages, and synthesis parameters via API calls. The architecture supports loading multiple models concurrently and offers GPU acceleration for inference.
Quick Start & Requirements
bash -c "$(wget -O- https://raw.githubusercontent.com/Artrajz/vits-simple-api/main/vits-simple-api-installer-latest.sh)"
followed by docker-compose up -d
.git clone
the repository, pip install -r requirements.txt
(Python 3.10 recommended), then python app.py
.start.bat
..pth
, .json
, .ckpt
) must be downloaded and placed in the data/models
directory.Highlighted Details
Maintenance & Community
The project is actively maintained, with contributions from various individuals. Community support is available via a Chinese QQ group.
Licensing & Compatibility
The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The project relies on external model files which must be manually downloaded and configured. Some advanced features like SSML are still in progress. The README mentions potential issues with non-English or special character paths on Windows.
2 weeks ago
1 day