so-vits-svc by svc-develop-team

Singing voice conversion using SoftVC VITS

Created 3 years ago

27,998 stars

Top 1.3% on SourcePulse

View on GitHub

5 Experts Love This Project

Jasper Zhang

Cofounder of Hyperbolic

Cofounder of Prime Intellect

and 1 more!

Project Summary

This project provides a framework for Singing Voice Conversion (SVC) using the VITS architecture, focusing on transforming one singing voice into another while preserving original pitch and intonation. It is targeted at developers and researchers interested in AI-powered music generation and voice manipulation, offering a high degree of customization and control over the conversion process.

How It Works

The core of so-vits-svc leverages a SoftVC content encoder to extract speech features, which are then directly fed into the VITS model. This bypasses the need for text-based intermediate representations, ensuring that the original audio's pitch and intonation are maintained. The system utilizes NSF HiFiGAN as a vocoder to mitigate sound interruption issues, and recent updates include support for shallow diffusion, Whisper-PPG encoder, static/dynamic sound fusion, loudness embedding, and feature retrieval from RVC.

Quick Start & Requirements

Install: Primarily through Python dependencies.
Prerequisites: Python 3.8.9 recommended. Requires downloading pre-trained model files for various speech encoders (e.g., ContentVec, Whisper-PPG) and optionally for vocoders (NSF-HIFIGAN) and F0 predictors (RMVPE). GPU acceleration is highly recommended for training and inference.
Setup: Requires downloading multiple model checkpoints and placing them in specific directories. The process involves data preparation, preprocessing, and training.
Links: Open In Colab

Highlighted Details

Focuses exclusively on Singing Voice Conversion (SVC), not Text-to-Speech (TTS).
Supports multiple speech encoders (ContentVec, Whisper-PPG, WavLM, etc.) and F0 predictors (RMVPE, Crepe, Dio, etc.).
Offers advanced features like dynamic voice mixing, feature retrieval, and shallow diffusion for enhanced sound quality.
Includes an ONNX export option for model deployment.

Maintenance & Community

The project is actively maintained, with recent updates to the 4.1-Stable version. The original repository was deleted, and this is a reconstruction. Community support channels are not explicitly listed in the README.

Licensing & Compatibility

License: AGPL-3.0.
Compatibility: The AGPL-3.0 license is a strong copyleft license. Use in commercial or closed-source projects may require careful consideration of its terms, particularly regarding modifications and distribution.

Limitations & Caveats

The project is intended for academic purposes and not production environments. Users are solely responsible for dataset authorization and any infringement issues arising from input sources. The AGPL-3.0 license may impose significant restrictions on commercial use. The README notes that ONNX export for Hubert models is not directly supported by this project.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

72 stars in the last 30 days