ColorSplitter by KakaruHayate

Vocal timbre classification CLI tool

Created 2 years ago

266 stars

Top 96.3% on SourcePulse

Project Summary

ColorSplitter is a CLI tool for pre-processing single-speaker vocal datasets by classifying and separating vocal timbres. It addresses unstable timbre performance in AI voice models by filtering training data, benefiting researchers and developers in voice synthesis. The tool offers advanced capabilities like automatic clustering optimization and emotion classification to enhance data quality.

How It Works

The project uses Speaker Verification technology to categorize vocal timbre styles. It employs clustering algorithms (SpectralCluster, UmapHdbscan) with automatic optimization, cluster merging (--mer_cosine), and minimum cluster counts (--nmin). This automates timbre data refinement for consistent model inputs, offering novel filtering via emotion encoding and mixed-feature audio selection.

Quick Start & Requirements

Installation requires cloning and pip install -r requirements.txt. Prerequisites include Python 3.8 and Microsoft C++ Build Tools. GPU PyTorch is recommended; CPU PyTorch suffices for the timbre encoder only. Data must be structured in ./input/<speaker_name>/raw/wavs/. For emotion classification, download pytorch_Model.bin from Hugging Face at https://huggingface.co/audeering/wav2vec2-large-robust-12-ft-emotion-msp-dim.

Highlighted Details

Features automatic optimization for clustering results, removing manual selection.
Supports multiple clustering methods and allows merging of similar clusters.
Integrates an emotion classification module using wav2vec2-large-robust-12-ft-emotion-msp-dim.
Includes --encoder mix for filtering audio based on dual features, useful for advanced VITS model prompting.

Maintenance & Community

The project acknowledges community user "洛泠羽". No further details on maintainers, community channels, or roadmaps are provided in the current documentation snippet.

Licensing & Compatibility

No specific licensing information or compatibility notes for commercial use were found in the provided README content.

Limitations & Caveats

The project notes that the relationship between singing timbre variations and voiceprint differences is unclear, framing its application partly as experimental. Research in this area is described as scarce. Users may still need to manually merge small clusters after automated classification.

ColorSplitter by KakaruHayate

Explore Similar Projects

index-tts-lora by asr-pub

SpeechGPT-2.0-preview by OpenMOSS

StyleSpeech by KevinMIN95

TCSinger by AaronZ345

UniSpeech by microsoft

lora-svc by PlayVoice

zamia-speech by gooofy

ai-audio-datasets by Yuan-ManX

voice_datasets by jim-schwoebel

higgs-audio by boson-ai

whisper-vits-svc by PlayVoice

GPT-SoVITS by RVC-Boss