Inference server for voice cloning models
Top 90.7% on sourcepulse
This project provides a simplified inference server for GPT-SoVITS, a leading voice cloning technology. It targets users who have trained models and need an easy-to-deploy solution for voice synthesis, especially on resource-constrained environments like mobile phones or CPU-based servers, abstracting away the complexity of the full GPT-SoVITS project.
How It Works
The server extracts the core inference logic from the original GPT-SoVITS project into a single server.py
file. This approach prioritizes minimal dependencies and ease of use, allowing users to run voice cloning with pre-trained models without needing to manage the entire, complex original project. It's designed to be runnable on CPUs, making it accessible for users without expensive GPU hardware.
Quick Start & Requirements
chinese-hubert-base
, chinese-roberta-wwm-ext-large
) from Hugging Face and place them locally, updating paths in server.py
. For Windows, use the provided runtime (runtime/python.exe ./server.py
). Ensure ffmpeg.exe
is in the same directory as server.py
on Windows.ffmpeg.exe
(Windows only).Highlighted Details
Maintenance & Community
The project is a personal extraction from the original GPT-SoVITS. Future optimization plans include re-integrating Japanese and English support, code standardization, performance improvements, and potentially a GUI wrapper and Docker packaging.
Licensing & Compatibility
The licensing is not explicitly stated in the README. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
Currently, the project is Chinese-only, with Japanese and English support removed. The README mentions potential path adjustments in server.py
's clean_path
function, indicating a need for user configuration.
1 year ago
Inactive