PyTorch for zero-shot singing voice synthesis research
Top 80.9% on sourcepulse
TCSinger is a PyTorch implementation for zero-shot singing voice synthesis (SVS) with advanced style transfer and multi-level control capabilities. It targets researchers and developers in audio synthesis and AI music generation, offering personalized and controllable SVS by enabling style transfer across different languages and singing styles.
How It Works
TCSinger employs a novel approach using a clustering style encoder to extract stylistic features and a Style and Duration Language Model (S&D-LM) for predicting style information and phoneme durations. The core innovation lies in its style-adaptive decoder, which utilizes a mel-style adaptive normalization method for generating intricate song details. This architecture addresses challenges in style modeling, transfer, and control, leading to improved synthesis quality and controllability.
Quick Start & Requirements
conda create -n tcsinger python=3.10
conda install --yes --file requirements.txt
conda activate tcsinger
inference/style_transfer.py
and inference/style_control.py
.Highlighted Details
Maintenance & Community
The project is associated with Zhejiang University and accepted by EMNLP 2024. Checkpoints were released in December 2024, and code in November 2024.
Licensing & Compatibility
The repository's license is not explicitly stated in the README. The disclaimer prohibits generating singing without consent, particularly for public figures, and warns of potential copyright violations.
Limitations & Caveats
The provided pre-trained TCSinger checkpoint only supports Chinese and English. For multilingual capabilities, users must train their own models based on GTSinger. The effectiveness of the style control feature is noted as suboptimal for certain timbres due to the inclusion of speech and unannotated data.
4 weeks ago
Inactive