ColorSplitter  by KakaruHayate

Vocal timbre classification CLI tool

Created 1 year ago
257 stars

Top 98.4% on SourcePulse

GitHubView on GitHub
Project Summary

ColorSplitter is a CLI tool for pre-processing single-speaker vocal datasets by classifying and separating vocal timbres. It addresses unstable timbre performance in AI voice models by filtering training data, benefiting researchers and developers in voice synthesis. The tool offers advanced capabilities like automatic clustering optimization and emotion classification to enhance data quality.

How It Works

The project uses Speaker Verification technology to categorize vocal timbre styles. It employs clustering algorithms (SpectralCluster, UmapHdbscan) with automatic optimization, cluster merging (--mer_cosine), and minimum cluster counts (--nmin). This automates timbre data refinement for consistent model inputs, offering novel filtering via emotion encoding and mixed-feature audio selection.

Quick Start & Requirements

Installation requires cloning and pip install -r requirements.txt. Prerequisites include Python 3.8 and Microsoft C++ Build Tools. GPU PyTorch is recommended; CPU PyTorch suffices for the timbre encoder only. Data must be structured in ./input/<speaker_name>/raw/wavs/. For emotion classification, download pytorch_Model.bin from Hugging Face at https://huggingface.co/audeering/wav2vec2-large-robust-12-ft-emotion-msp-dim.

Highlighted Details

  • Features automatic optimization for clustering results, removing manual selection.
  • Supports multiple clustering methods and allows merging of similar clusters.
  • Integrates an emotion classification module using wav2vec2-large-robust-12-ft-emotion-msp-dim.
  • Includes --encoder mix for filtering audio based on dual features, useful for advanced VITS model prompting.

Maintenance & Community

The project acknowledges community user "洛泠羽". No further details on maintainers, community channels, or roadmaps are provided in the current documentation snippet.

Licensing & Compatibility

No specific licensing information or compatibility notes for commercial use were found in the provided README content.

Limitations & Caveats

The project notes that the relationship between singing timbre variations and voiceprint differences is unclear, framing its application partly as experimental. Research in this area is described as scarce. Users may still need to manually merge small clusters after automated classification.

Health Check
Last Commit

8 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 30 days

Explore Similar Projects

Starred by Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm) and Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems").

GPT-SoVITS by RVC-Boss

0.3%
52k
Few-shot voice cloning and TTS web UI
Created 1 year ago
Updated 1 month ago
Feedback? Help us improve.