Discover and explore top open-source AI tools and projects—updated daily.
Singing voice conversion tool using Whisper & BigVGAN
Top 51.7% on SourcePulse
This project provides a singing voice conversion (SVC) system that leverages OpenAI's Whisper for content encoding and NVIDIA's BigVGAN for neural source-filter synthesis. It targets researchers and hobbyists interested in AI-powered voice manipulation and singing synthesis, enabling users to clone singing voices with a high degree of control.
How It Works
The system processes audio by first separating accompaniment, then cutting it into short segments for Whisper to extract content embeddings (PPG). Simultaneously, it extracts pitch (F0) and speaker timbre information. These features are then fed into a BigVGAN-based generator, conditioned on the target speaker's timbre, to synthesize the converted singing voice. This multi-stage approach aims for high-fidelity conversion by decoupling content, pitch, and timbre.
Quick Start & Requirements
pip install -r requirements.txt
medium.pt
)best_model.pth.tar
)maxgan_pretrain_32K.pth
)Highlighted Details
svc_gui.py
).Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
1 year ago
1 day