DiffRhythm by ASLP-lab

AI music generation model

Created 1 year ago

2,259 stars

Top 19.7% on SourcePulse

View on GitHub

1 Expert Loves This Project

Luis Capelo

Cofounder of Lightning AI

Project Summary

DiffRhythm addresses the challenge of generating full-length songs using a latent diffusion model, offering a fast and simple end-to-end solution. It targets researchers, developers, and music enthusiasts seeking to explore AI-driven music creation, providing a powerful foundation for innovation with its advanced capabilities and open-source nature.

How It Works

DiffRhythm leverages a latent diffusion architecture for end-to-end music generation, enabling the creation of complete songs. Its approach is designed for speed and simplicity, distinguishing itself as the first open-source diffusion-based model capable of producing full-length musical pieces, with recent updates enhancing audio quality, instrumentation, and structural understanding.

Quick Start & Requirements

Installation: Clone the repository, set up a Python 3.10 environment (conda or venv), and install dependencies via pip install -r requirements.txt. Docker installation is also supported.
Prerequisites: Requires espeak-ng (installation varies by OS).
Hardware: A minimum of 8GB VRAM is recommended for DiffRhythm-base; higher VRAM may be needed if chunked decoding is disabled.
Resources: Official Huggingface Space demo and paper are available for exploration.

Highlighted Details

Pioneering open-source diffusion-based model for end-to-end full-length song generation.
Supports text-to-music generation using descriptive style prompts (e.g., "Jazzy Nightclub Vibe").
Features an instrumental mode for generating music from abstract prompts (e.g., "Arctic research station, theremin auroras").
Recent updates (v1.2) significantly improve audio quality, instrumentation, arrangement, and enable song editing/continuation.
Supports generation of songs up to 4 minutes and 45 seconds.

Maintenance & Community

The project shows active development with recent updates in May 2025. A Discord server is available for community engagement. Contact is provided via email for the research team.

Licensing & Compatibility

DiffRhythm is released under the Apache License 2.0, permitting free use, modification, and distribution. Users are advised to implement verification for originality and disclose AI involvement due to potential risks like copyright infringement or misuse.

Limitations & Caveats

Colab and Gradio support are listed as future TODOs. The model's VRAM requirement can be a barrier to entry. Users must be mindful of potential copyright issues and the responsible use of AI-generated music, particularly concerning stylistic similarities and cultural elements.

Health Check

Last Commit

3 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

28 stars in the last 30 days