MMVC_Trainer by isletennos

Voice conversion trainer for real-time voice changer

Created 4 years ago

931 stars

Top 38.6% on SourcePulse

Project Summary

This repository provides tools for training models for MMVC (RealTime-Many to Many Voice Conversion), an AI-powered real-time voice changer. It targets users who want to create custom voice models for voice conversion, enabling them to transform their voice into that of various characters or individuals.

How It Works

MMVC_Trainer leverages Google Colaboratory for accessible model training, abstracting away local environment dependencies. The core process involves preparing audio datasets (recorded speech and target voice samples) and corresponding text transcriptions. Users then utilize provided Jupyter notebooks to configure training parameters, initiate the training process using pre-trained models, and validate the resulting voice conversion models.

Quick Start & Requirements

Install: Install by running the provided Colab notebooks.
Prerequisites: Google Account, audio data (24000Hz, 16bit, 1ch), and text transcriptions. Pre-trained models are available via Hugging Face.
Setup: Follow the step-by-step tutorials within the Colab notebooks.
Links:
- Colab Installation: https://github.com/isletennos/MMVC_Trainer
- MMVC Client: https://github.com/isletennos/MMVC_Client
- Community Discord: https://discord.gg/2MGysH3QpD

Highlighted Details

Enables real-time voice conversion.
Supports training custom voice models from user-provided audio.
Utilizes Google Colab for accessible training environments.
Includes tutorials for various voice conversion scenarios, such as transforming into "Zundamon."

Maintenance & Community

The project is active with a Discord community for support and discussion. The developer can be contacted via Pixiv Fanbox.

Licensing & Compatibility

Released under the MIT license, allowing for free use, distribution, modification, and commercial use. However, specific voice data used (e.g., Zundamon, Kyushu Sora) may have their own terms of use that must be adhered to. When using specific character voices, attribution is recommended.

Limitations & Caveats

The README mentions that Mac (Intel) performance can be slow, and while CPU operation is possible on recent hardware, GPU acceleration is generally preferred for optimal performance. Users must ensure their audio data meets the specified format (24000Hz, 16bit, 1ch).

MMVC_Trainer by isletennos

Explore Similar Projects

easevoice-trainer by megaease

SpeechGPT-2.0-preview by OpenMOSS

assem-vc by maum-ai

lora-svc by PlayVoice

sesame_csm_openai by phildougherty

alexandria-audiobook by Finrandojin

Easy-Voice-Toolkit by Spr-Aachen

seed-vc by Plachtaa

KittenTTS by KittenML

whisper-vits-svc by PlayVoice

voicebox by jamiepine

GPT-SoVITS by RVC-Boss