GPT-SoVITS-Server  by ben0oil1

Inference server for voice cloning models

Created 1 year ago
303 stars

Top 88.2% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides a simplified inference server for GPT-SoVITS, a leading voice cloning technology. It targets users who have trained models and need an easy-to-deploy solution for voice synthesis, especially on resource-constrained environments like mobile phones or CPU-based servers, abstracting away the complexity of the full GPT-SoVITS project.

How It Works

The server extracts the core inference logic from the original GPT-SoVITS project into a single server.py file. This approach prioritizes minimal dependencies and ease of use, allowing users to run voice cloning with pre-trained models without needing to manage the entire, complex original project. It's designed to be runnable on CPUs, making it accessible for users without expensive GPU hardware.

Quick Start & Requirements

  • Install: Download pre-trained models (chinese-hubert-base, chinese-roberta-wwm-ext-large) from Hugging Face and place them locally, updating paths in server.py. For Windows, use the provided runtime (runtime/python.exe ./server.py). Ensure ffmpeg.exe is in the same directory as server.py on Windows.
  • Prerequisites: Python, pre-trained models, ffmpeg.exe (Windows only).
  • Setup: Minimal, focused on downloading models and configuring paths.

Highlighted Details

  • Designed for CPU inference, making voice cloning accessible without GPUs.
  • Successfully tested on a mobile phone, demonstrating extreme portability.
  • Focuses solely on Chinese language support, simplifying the codebase.
  • Aims to abstract away complex environment setup for end-users.

Maintenance & Community

The project is a personal extraction from the original GPT-SoVITS. Future optimization plans include re-integrating Japanese and English support, code standardization, performance improvements, and potentially a GUI wrapper and Docker packaging.

Licensing & Compatibility

The licensing is not explicitly stated in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Currently, the project is Chinese-only, with Japanese and English support removed. The README mentions potential path adjustments in server.py's clean_path function, indicating a need for user configuration.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Pietro Schirano Pietro Schirano(Founder of MagicPath), and
2 more.

metavoice-src by metavoiceio

0.1%
4k
TTS model for human-like, expressive speech
Created 1 year ago
Updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Chaoyu Yang Chaoyu Yang(Founder of Bento), and
1 more.

fish-speech by fishaudio

0.3%
23k
Open-source TTS for multilingual speech synthesis
Created 1 year ago
Updated 1 week ago
Starred by Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm) and Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems").

GPT-SoVITS by RVC-Boss

0.3%
51k
Few-shot voice cloning and TTS web UI
Created 1 year ago
Updated 1 week ago
Feedback? Help us improve.