lora-svc by PlayVoice

Singing voice conversion tool using Whisper & BigVGAN

Created 3 years ago

643 stars

Top 51.8% on SourcePulse

Project Summary

This project provides a singing voice conversion (SVC) system that leverages OpenAI's Whisper for content encoding and NVIDIA's BigVGAN for neural source-filter synthesis. It targets researchers and hobbyists interested in AI-powered voice manipulation and singing synthesis, enabling users to clone singing voices with a high degree of control.

How It Works

The system processes audio by first separating accompaniment, then cutting it into short segments for Whisper to extract content embeddings (PPG). Simultaneously, it extracts pitch (F0) and speaker timbre information. These features are then fed into a BigVGAN-based generator, conditioned on the target speaker's timbre, to synthesize the converted singing voice. This multi-stage approach aims for high-fidelity conversion by decoupling content, pitch, and timbre.

Quick Start & Requirements

Install: pip install -r requirements.txt
Prerequisites:
- Python 3.x
- Download Whisper medium model (medium.pt)
- Download Timbre Encoder (best_model.pth.tar)
- Download BigVGAN pre-trained model (maxgan_pretrain_32K.pth)
Setup: Requires downloading multiple pre-trained models and preparing datasets with specific directory structures. Data preprocessing involves several Python scripts for resampling, pitch extraction, PPG extraction, and timbre code extraction.
Links: Demo Video

Highlighted Details

Leverages three AI giants: OpenAI Whisper, NVIDIA BigVGAN, and Microsoft Adapter.
Supports multi-language models for Whisper.
Offers both command-line inference and a GUI (svc_gui.py).
Includes steps for exporting inference models and post-processing with VAD.

Maintenance & Community

The project references several research papers and GitHub repositories, indicating a foundation in established AI techniques.
No explicit community links (Discord, Slack) or roadmap are provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. The project incorporates code from various sources, each with its own license. Users should verify compatibility for commercial use.

Limitations & Caveats

LoRA implementation is noted as not fully integrated within this specific repository.
The setup process is complex, requiring manual downloading of multiple large pre-trained models and careful data preparation.
Performance and quality are highly dependent on the quality of the input audio and the chosen pre-trained models.

Health Check

Last Commit

2 years ago

Responsiveness

1 day

Pull Requests (30d)

0

Issues (30d)

0

Star History

2 stars in the last 30 days

Explore Similar Projects

Starred by

Piotr Dąbkowski

Piotr Dąbkowski(Cofounder of ElevenLabs).

assem-vc by maum-ai

PyTorch code for any-to-many voice conversion research

Created 4 years ago

Updated 3 years ago

NeuralSVB by MoonInTheRiver

Singing voice beautifier research paper implementation

Created 3 years ago

Updated 2 years ago

MMVC_Trainer by isletennos

Voice conversion trainer for real-time voice changer

Created 3 years ago

Updated 1 year ago

StarGANv2-VC by yl4579

Voice conversion research paper using StarGAN v2

Created 4 years ago

Updated 1 year ago

Easy-Voice-Toolkit by Spr-Aachen

Local AI voice toolkit for audio processing, recognition, transcription, and conversion

Created 2 years ago

Updated 3 weeks ago

TransformerTTS by spring-media

TensorFlow 2 implementation for non-autoregressive text-to-speech

Created 5 years ago

Updated 1 year ago

seed-vc by Plachtaa

CLI tool for zero-shot voice/singing voice conversion, supporting real-time

Created 1 year ago

Updated 8 months ago

VITS-fast-fine-tuning by Plachtaa

VITS pipeline for fast speaker adaptation TTS and voice conversion

Created 2 years ago

Updated 11 months ago

Starred by

Christian Laforte

Christian Laforte(Distinguished Engineer at NVIDIA; Former CTO at Stability AI),

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and

1 more.

Amphion by open-mmlab

Toolkit for audio, music, and speech generation research

Created 2 years ago

Updated 7 months ago

whisper-vits-svc by PlayVoice

Singing voice conversion engine based on VITS

Created 3 years ago

Updated 1 year ago

Starred by

Jasper Zhang

Jasper Zhang(Cofounder of Hyperbolic),

Chenlin Meng

Chenlin Meng(Cofounder of Pika), and

3 more.

so-vits-svc by svc-develop-team

Singing voice conversion using SoftVC VITS

Created 2 years ago

Updated 2 years ago

Starred by

Didier Lopes

Didier Lopes(Founder of OpenBB),

Carol Willing

Carol Willing(Core Contributor to CPython, Jupyter), and

13 more.

Real-Time-Voice-Cloning by CorentinJ

Voice cloning for real-time speech generation

Created 6 years ago

Updated 3 weeks ago

Feedback? Help us improve.