vits-simple-api  by Artrajz

HTTP API for VITS-based text-to-speech and voice conversion

created 2 years ago
1,000 stars

Top 37.9% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a simple HTTP API for VITS (Variational Inference with adversarial learning for Text-to-Speech) and related TTS models, extending the MoeGoe framework. It targets developers and researchers needing to integrate advanced text-to-speech and voice conversion capabilities into applications, offering support for multiple VITS variants and automatic language detection.

How It Works

The API acts as a wrapper around various VITS-based models, including HuBert-soft VITS, vits_chinese, Bert-VITS2, W2V2 VITS, and GPT-SoVITS. It exposes endpoints for text-to-speech synthesis and voice conversion, allowing users to specify models, languages, and synthesis parameters via API calls. The architecture supports loading multiple models concurrently and offers GPU acceleration for inference.

Quick Start & Requirements

  • Docker: bash -c "$(wget -O- https://raw.githubusercontent.com/Artrajz/vits-simple-api/main/vits-simple-api-installer-latest.sh)" followed by docker-compose up -d.
  • Virtual Environment: git clone the repository, pip install -r requirements.txt (Python 3.10 recommended), then python app.py.
  • Windows: Download and extract the release package, then run start.bat.
  • Prerequisites: GPU with CUDA support is recommended for accelerated inference. Model files (.pth, .json, .ckpt) must be downloaded and placed in the data/models directory.
  • Docs: https://artrajz-vits-simple-api.hf.space/

Highlighted Details

  • Supports VITS, HuBert-soft VITS, vits_chinese, Bert-VITS2, W2V2 VITS, and GPT-SoVITS.
  • Automatic language recognition and processing with custom language scope support.
  • Features long text batch processing and SSML support (in progress).
  • Offers both a web UI and an admin backend for model management.

Maintenance & Community

The project is actively maintained, with contributions from various individuals. Community support is available via a Chinese QQ group.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project relies on external model files which must be manually downloaded and configured. Some advanced features like SSML are still in progress. The README mentions potential issues with non-English or special character paths on Windows.

Health Check
Last commit

2 weeks ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
61 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Lianmin Zheng Lianmin Zheng(Author of SGLang).

fish-speech by fishaudio

0.3%
23k
Open-source TTS for multilingual speech synthesis
created 1 year ago
updated 1 week ago
Feedback? Help us improve.