Singing voice conversion using SoftVC VITS
Top 1.4% on sourcepulse
This project provides a framework for Singing Voice Conversion (SVC) using the VITS architecture, focusing on transforming one singing voice into another while preserving original pitch and intonation. It is targeted at developers and researchers interested in AI-powered music generation and voice manipulation, offering a high degree of customization and control over the conversion process.
How It Works
The core of so-vits-svc leverages a SoftVC content encoder to extract speech features, which are then directly fed into the VITS model. This bypasses the need for text-based intermediate representations, ensuring that the original audio's pitch and intonation are maintained. The system utilizes NSF HiFiGAN as a vocoder to mitigate sound interruption issues, and recent updates include support for shallow diffusion, Whisper-PPG encoder, static/dynamic sound fusion, loudness embedding, and feature retrieval from RVC.
Quick Start & Requirements
Highlighted Details
Maintenance & Community
The project is actively maintained, with recent updates to the 4.1-Stable version. The original repository was deleted, and this is a reconstruction. Community support channels are not explicitly listed in the README.
Licensing & Compatibility
Limitations & Caveats
The project is intended for academic purposes and not production environments. Users are solely responsible for dataset authorization and any infringement issues arising from input sources. The AGPL-3.0 license may impose significant restrictions on commercial use. The README notes that ONNX export for Hubert models is not directly supported by this project.
1 year ago
Inactive