DiffSinger by MoonInTheRiver

PyTorch code for singing voice synthesis (SVS) and TTS research

Created 4 years ago

4,696 stars

Top 10.5% on SourcePulse

View on GitHub

1 Expert Loves This Project

Patrick von Platen

Author of Hugging Face Diffusers; Research Engineer at Mistral

Project Summary

DiffSinger provides official PyTorch implementations for Singing Voice Synthesis (SVS) and Text-to-Speech (TTS) using a shallow diffusion mechanism. It targets researchers and developers in audio synthesis, offering advanced capabilities for generating singing and spoken voices with high fidelity. The project aims to simplify and improve the quality of AI-generated vocal performances.

How It Works

DiffSinger employs a shallow diffusion model for generating mel-spectrograms from lyrical and pitch information. This approach allows for efficient and high-quality audio synthesis. The system can also leverage MIDI data for pitch extraction, enabling more flexible control over vocal melodies. For speech synthesis (DiffSpeech), it converts text directly to mel-spectrograms, which are then converted to waveforms using vocoders like HiFiGAN.

Quick Start & Requirements

Installation: Requires Python 3.8. Install dependencies using pip install -r requirements_2080.txt (for CUDA 10.2) or pip install -r requirements_3090.txt (for CUDA 11.4). PyTorch 1.9.0 is specified.
Prerequisites: CUDA-enabled GPU (NVIDIA 2080Ti or 3090 recommended), Python 3.8.
Documentation: Run DiffSpeech (TTS version), Run DiffSinger (SVS version).
Demo: Interactive SVS on HuggingFace 🤗.

Highlighted Details

Supports both Singing Voice Synthesis (SVS) and Text-to-Speech (TTS).
Introduces DiffSinger-PN for accelerated inference using PNDM.
Includes related projects like NeuralSVB for voice beautifying and PortaSpeech for TTS.
Offers multiple pipeline configurations for SVS, including MIDI-to-F0 and implicit F0 prediction.

Maintenance & Community

The project has seen recent updates, including the addition of DiffSinger-PN and improved documentation. Related works like NeuralSVB and PortaSpeech have also been released. The project acknowledges contributions from lucidrains, kan-bayashi, and jik876, and specifically thanks Team Openvpi for maintenance and sharing.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. This requires further investigation for commercial use or integration into closed-source projects.

Limitations & Caveats

The README specifies different requirements files for different CUDA versions, suggesting potential compatibility issues or specific hardware needs. The lack of a clear license is a significant caveat for adoption.

Health Check

Last Commit

9 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

26 stars in the last 30 days