Singing voice beautifier research paper implementation
Top 69.0% on sourcepulse
NeuralSVB is a PyTorch implementation for enhancing singing voice quality, targeting researchers and developers in speech synthesis and audio processing. It aims to "beautify" singing by improving timbre, pitch, and expressiveness, based on the ACL 2022 paper "Learning the Beauty in Songs."
How It Works
NeuralSVB employs a variational autoencoder (VAE) with a global mean-variance-log-variance (global_mle) objective for singing voice synthesis. It leverages pre-trained components, including a HifiGAN-Singing vocoder specialized for singing with a Non-stationary Filtering (NSF) mechanism and a Phoneme Posteriorgram (PPG) extractor. This approach allows for disentangled control over vocal timbre and expressive features, contributing to a more natural and aesthetically pleasing singing output.
Quick Start & Requirements
pip install Requirements.txt
(from the repository).requirements.txt
. Requires pre-trained models for HifiGAN-Singing vocoder and PPG Extractor, which need to be downloaded and placed in the checkpoints
directory.Highlighted Details
Maintenance & Community
The project is associated with the NATSpeech framework. Issues can be raised on GitHub, with a note that solutions are not guaranteed.
Licensing & Compatibility
The repository's license is not explicitly stated in the README. However, its foundation on DiffSinger and relation to NATSpeech suggests potential Apache 2.0 or similar permissive licenses, but this requires verification. Compatibility for commercial use is not specified.
Limitations & Caveats
Inference from raw audio inputs is marked as "WIP" (Work In Progress). The README directs users to Appendix D of the paper for detailed limitations and solutions. The project's reliance on specific pre-trained models and a custom dataset may present integration challenges.
1 year ago
Inactive